Re: Documentation on new data model for Leo outlines

Edward K. Ream Tue, 15 May 2018 04:37:16 -0700

On Monday, May 14, 2018 at 9:44:11 AM UTC-5, Edward K. Ream wrote:
>
>
> On Mon, May 14, 2018 at 8:02 AM, vitalije <vitali...@gmail.com> wrote:
>
>> I have just published first two articles about what I have done so far in 
>> this mini-project. Here are the links:
>>
>>    1. Leo tree model - new approach 
>>    <https://computingart.net/leo-tree-model-1.html>
>>    2. Leo tree model - loading from a file 
>>    <https://computingart.net/leo-tree-model-2.html>
>>
>> This is superb work.  I never dreamed the code could be improved so much.
>


Still true, after more reflection. Some comments:


*Strategy*
I have confidence that this project will be a success.  Sitting in the 
bathtub yesterday, I realized that code details don't matter much.  The 
only things that matter are:

1. Existing Leo scripts must not be impacted in *any* way.

In particular, all existing Position methods and generators must have 
exactly the same *effect* as before.  Many (All?) Position methods and 
generators will need to be rewritten.  That's fine. The new code will be 
simpler than the old code.

2. All existing unit tests must pass.

Naturally, unit tests of low-level details of VNodes and Positions can 
change as necessary.


*Summary of code*
I've studied the code more thoroughly.  My high-level summary of the read 
process:

Part 1: Create tuples.

Use xml.etree.ElementTree when reading .leo files.  Use regexs when reading 
external files.

Part 2: Use the tuples to create vnodes.

This is non-trivial, because @others and section references alter the tree 
structure.

I think that's all that most devs will need to know.

*Readability will not affect performance*

attrs[gnx] is a tuple [h, b, ps, chn, sz[0]]. The components should be 
accessed via a bunch, or an enum, say,

e_h = 0
e_b = 1
e_ps = 2
...

To prove that this has no effect on performance, I added a global count, 
gCount of the number of times a[n] was accessed in leoDataModel.py during 
the tests run in test_leo_data_model.py.  Here is the instrumented run, 
with the new count last:

tree size 8137 read in 488.24ms files:164
ok 471 471
Average: 24.82ms
checking write
ok
tree correct 5327576
pickle avg: 20.38ms
upickle avg: 14.93ms
profiling write_all
ok 695.47ms
gCount: 50723

And here is the timeit script showing the time penalty of using an enum 
constant, e_gnx, instead of hard constants in the a array.

number = 50000
setup1 = 'a = [0]'
stmt1 = 'b = a[0]'
n1 = timeit.timeit(stmt=stmt1, setup=setup1, number=number)
n1 = float(n1)

setup2 = 'e_gnx = 0; a = [0]'
stmt2 = 'b = a[e_gnx]'
n2 = timeit.timeit(stmt=stmt2, setup=setup2, number=number)
n2 = float(n2)

print('n1:    %8.6f' % n1)
print('n2:    %8.6f' % n2)
print('n2-n2: %8.6f' % (n2-n1))

And here is the output on my machine:

n1:    0.001246
n2:    0.001259
n2-n2: 0.000013

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To post to this group, send email to leo-editor@googlegroups.com.
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Re: Documentation on new data model for Leo outlines

Reply via email to