ENB: Studying the performance of Vitalije's code

Edward K. Ream Thu, 31 May 2018 03:03:01 -0700

I spent most of yesterday studying the performance of Vitalije's prototype 
code.


The only truly important performance metric is how long it takes 
miniTkLeo.py to load a substantial .leo file.  I changed this file so it 
loads my private leoPy.leo file if no file is given on the command line.

On my machine, it takes 0.6 to 0.7 seconds to load this file *and all 
external files*. This performance is why Vitalije and I are excited about 
the code.

It's not possible to use cProfile directly on miniTkLeo.py because it uses 
Python's threading and queue modules.  Instead, I added profiling code to 
the loadex function, like this:

def loadex():
    '''The target of threading.Thread.'''
    if 0: # Profile the code.
        cProfile.runctx('loadex_helper()',
            globals(),
            locals(),
            'profile_stats', # 'profile-%s.out' % process_name
        )
        print('===== writing profile_stats')
        p = pstats.Stats('profile_stats')
        p.strip_dirs().sort_stats('tottime').print_stats(50)
            # .print_stats('leoDataModel.py', 50)
    else:
        loadex_helper()
    
def loadex_helper():
    ltm2 = LeoTreeModel.frombytes(ltmbytes)
    loaddir = os.path.dirname(fname)
    loadExternalFiles(ltm2, loaddir)
    G.q.put(ltm2)

With statistics enabled, the load time on my machine is 0.9 seconds, 
instead of 0.6 to 0.7 seconds.

This code produces the following statistics, edited to show only the 
highlights:

1. Limited to leoDataModel.py:

TotTime:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 8212    0.267    0.000    0.586    0.000  leoDataModel.py:1233(
load_derived_file)
 8212    0.023    0.000    0.611    0.000  leoDataModel.py:1569(viter)
 8047    0.017    0.000    0.024    0.000  leoDataModel.py:1327(set_node)

Calls:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
16691    0.003    0.000    0.006    0.000  leoDataModel.py:37(parPosIter)
 8212    0.268    0.000    0.587    0.000  leoDataModel.py:1233(
load_derived_file)
 8212    0.023    0.000    0.612    0.000  leoDataModel.py:1569(viter)
 8047    0.017    0.000    0.023    0.000  leoDataModel.py:1327(set_node)
  971    0.000    0.000    0.001    0.000  leoDataModel.py:293(parents)
806/165  0.001    0.000    0.001    0.000  leoDataModel.py:412(
updateParentSize) (in replaceNode)

2. Including all methods:
    
TotTime:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 8212    0.272    0.000    0.594    0.000  leoDataModel.py:1233(
load_derived_file)
626060   0.220    0.000    0.220    0.000  {method 'match' of 
'_sre.SRE_Pattern' objects}
232802   0.036    0.000    0.036    0.000  {method 'startswith' of 'str' 
objects}
 165     0.021    0.000    0.030    0.000  {method 'read' of 
'_io.TextIOWrapper' objects}
     
Calls:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
626060   0.221    0.000    0.221    0.000  {method 'match' of 
'_sre.SRE_Pattern' objects}
232802   0.036    0.000    0.036    0.000  {method 'startswith' of 'str' 
objects}
167416   0.014    0.000    0.014    0.000  {method 'append' of 'list' 
objects}
110420   0.008    0.000    0.008    0.000  {built-in method builtins.len}
95453    0.007    0.000    0.007    0.000  {method 'isspace' of 'str' 
objects}
35554    0.006    0.000    0.006    0.000  {method 'group' of 
'_sre.SRE_Match' objects}
16357    0.001    0.000    0.001    0.000  {method 'random' of 
'_random.Random' objects}
 8906    0.002    0.000    0.002    0.000  {method 'pop' of 'list' objects}
 8562    0.006    0.000    0.006    0.000  {method 'join' of 'str' objects}
 2204    0.000    0.000    0.000    0.000  {built-in method builtins.
isinstance}

This is remarkable.  To the first approximation, only load_derived_file 
matters.  None of the helper functions/generators contribute any 
substantial time to the overall load time.

*Summary*

load_derived_file is incredibly fast. When loading .leo files, only it's 
performance matters.

For this function only, the speed of attribute access may be crucial. 
Converting section references to functions in load_derived_file may slow 
the code by changing local refs to nonlocal refs.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

ENB: Studying the performance of Vitalije's code

Reply via email to