Complications reading sentinels

Edward K. Ream Mon, 28 Jun 2010 06:50:29 -0700

Here are some notes to myself about reading thin files.  Feel free to
ignore.


There are at least two conflicting goals:

1. Leave the old read code unchanged when new_read is False.

2. Simplify the code and use the same code as much as possible when
reading both old and new style sentinels.

I think goal 2 will win eventually.  It's becoming clear that mixing
the old and new data structures (v.tempBodyString vs. v.tempBodyList)
will make the code impossible to understand in six months.  I can not
allow that to happen.

As an interim step, I'll define a "unified" top-level switch in
leoAtFile.py.  When False, the *old* read will remain unchanged.  When
True, the old read code will use the same data structures as the new.

Testing

The unit tests for thin files is skimpy.  At present, I test the read
code by ensuring the *writing* all of Leo's files keeps those files
unchanged, that is, that round-tripping files is stable.  This is a
quick sanity test, but it's not enough.

Issues and complications

1.  Removing at.out stack.  The Aha that all writing goes to a.v, the
current node, seemed to promise that we could eliminate the at.out
stack.  However, the at.readNonl adjust the contents of the top of
this stack when handling the @nonl directive.  It should be possible
to examine at.v.tempBodyList instead.

2.  v.tempBodyString vs. v.tempBodyList.  My first thought was that
v.tempBodyList would be a clean alternative to v.tempBodyString.  That
is, that the old code would use v.tempBodyString and the new could
would use v.tempBodyList.  Alas, things are not so easy, as the next
two items show...

3. Handling @first and @last.  The present code uses tempBodyString to
handle @first and @last.  It's not clear that it is possibly simply to
use v.tempBodyList instead, although that's what the new code does
just now.

4. Node-changed logic.  At present (that is, in the code that reads
old thin sentinels) at.readEndNode has some complicated logic to
determine whether the node being read has been changed externally.  If
it has, we add an item to c.nodeConflictList.  Eventually, this cause
code in leoFileCommands.py to create conflict nodes in the outline.

The problems with the node-changing logic.

A. This code depends on at.v.tempBodyString (i.e., *old* format data)
as well as some other data.

B. The old scheme guarantees that at.readEndNode is called for every
node because there is an @-node sentinel for every node.  This isn't
true (or so obviously true) for the new scheme.  True, the new code
calls at.readEndNode to close nodes when seeing @-others and @-<<, but
at present there is (and can be) no such call when an @+node
terminates the previous node.

A refactoring may help.  We could define something like at.endBodyText
that would do the job without messing with the stacks, and call
at.endBodyText within at.readEndNode and also, in the new logic, at
the start of at.readStartNode to terminate the previous node.  Or
something like that :-)

C. The old code finessed duplicate (cloned) in a way that probably
isn't so easy to do in the new code.  Somehow the node-changed logic
in at.readEndNode does the correct thing when it sees an identical
copy of a cloned node that it has already seen.  The new code must
take care to do the same.

5. Handling doc parts.  The old code uses at.inCode as a switch to
determine whether we are accumulating an @doc part or not.  I'd like
to get rid of this switch and its associated at.docOut data
structure.  However, "looking ahead" when seeing @space or @doc might
create a complex interaction between at.scanText4 (the main loop) and
the lookahead code.  Instead, the present new code hacks
at.readDirective so that it sets at.inCode = False when @@c is seen.

BTW, eliminating at.inCode and at.docOut probably has no chance of
working if "real" sentinels can appear between @+doc (or @+space) and
the closing @-doc/@-space (old sentinels or @@c sentinel (new
sentinels).  More research is needed...

Ok, I've just done the research :-)  The relevant code is the code
that *writes* @doc parts:
@thin leoAtFile.py-->at.Writing-->Writing 4.x-->writing doc lines...
and also
@thin leoAtFile.py-->at.Writing-->Writing 4.x-->putBody

>From the code it's difficult to determine what kinds of sentinels can
appear between the sentinels that open and close a doc part.  I may do
some experiments, but it looks too risky to mess with this part of the
read code.  Thus, at.inCode and at.docOut likely will remain forever.

Conclusions

The old read is subtly complex in ways that I did not fully appreciate
when I considered using v.tempBodyList as an alternative to at.out and
v.tempBodyString

My guess at present is that the best long-term solution will be to
eliminate both v.tempBodyString and at.out, but this plan carries
significant risk.  I'll try to minimize the risk by using a "unified"
switch to mark where the *old* read code will change.  In effect, this
switch will mark where goal 1 is violated.

It may be that a new suite of unit tests will be needed to test the
code that reads thin sentinels.

For all these reasons, it is essential that this work be done in a
separate branch.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Complications reading sentinels

Reply via email to