Here are some notes to myself about reading thin files. Feel free to ignore.
There are at least two conflicting goals: 1. Leave the old read code unchanged when new_read is False. 2. Simplify the code and use the same code as much as possible when reading both old and new style sentinels. I think goal 2 will win eventually. It's becoming clear that mixing the old and new data structures (v.tempBodyString vs. v.tempBodyList) will make the code impossible to understand in six months. I can not allow that to happen. As an interim step, I'll define a "unified" top-level switch in leoAtFile.py. When False, the *old* read will remain unchanged. When True, the old read code will use the same data structures as the new. Testing The unit tests for thin files is skimpy. At present, I test the read code by ensuring the *writing* all of Leo's files keeps those files unchanged, that is, that round-tripping files is stable. This is a quick sanity test, but it's not enough. Issues and complications 1. Removing at.out stack. The Aha that all writing goes to a.v, the current node, seemed to promise that we could eliminate the at.out stack. However, the at.readNonl adjust the contents of the top of this stack when handling the @nonl directive. It should be possible to examine at.v.tempBodyList instead. 2. v.tempBodyString vs. v.tempBodyList. My first thought was that v.tempBodyList would be a clean alternative to v.tempBodyString. That is, that the old code would use v.tempBodyString and the new could would use v.tempBodyList. Alas, things are not so easy, as the next two items show... 3. Handling @first and @last. The present code uses tempBodyString to handle @first and @last. It's not clear that it is possibly simply to use v.tempBodyList instead, although that's what the new code does just now. 4. Node-changed logic. At present (that is, in the code that reads old thin sentinels) at.readEndNode has some complicated logic to determine whether the node being read has been changed externally. If it has, we add an item to c.nodeConflictList. Eventually, this cause code in leoFileCommands.py to create conflict nodes in the outline. The problems with the node-changing logic. A. This code depends on at.v.tempBodyString (i.e., *old* format data) as well as some other data. B. The old scheme guarantees that at.readEndNode is called for every node because there is an @-node sentinel for every node. This isn't true (or so obviously true) for the new scheme. True, the new code calls at.readEndNode to close nodes when seeing @-others and @-<<, but at present there is (and can be) no such call when an @+node terminates the previous node. A refactoring may help. We could define something like at.endBodyText that would do the job without messing with the stacks, and call at.endBodyText within at.readEndNode and also, in the new logic, at the start of at.readStartNode to terminate the previous node. Or something like that :-) C. The old code finessed duplicate (cloned) in a way that probably isn't so easy to do in the new code. Somehow the node-changed logic in at.readEndNode does the correct thing when it sees an identical copy of a cloned node that it has already seen. The new code must take care to do the same. 5. Handling doc parts. The old code uses at.inCode as a switch to determine whether we are accumulating an @doc part or not. I'd like to get rid of this switch and its associated at.docOut data structure. However, "looking ahead" when seeing @space or @doc might create a complex interaction between at.scanText4 (the main loop) and the lookahead code. Instead, the present new code hacks at.readDirective so that it sets at.inCode = False when @@c is seen. BTW, eliminating at.inCode and at.docOut probably has no chance of working if "real" sentinels can appear between @+doc (or @+space) and the closing @-doc/@-space (old sentinels or @@c sentinel (new sentinels). More research is needed... Ok, I've just done the research :-) The relevant code is the code that *writes* @doc parts: @thin leoAtFile.py-->at.Writing-->Writing 4.x-->writing doc lines... and also @thin leoAtFile.py-->at.Writing-->Writing 4.x-->putBody >From the code it's difficult to determine what kinds of sentinels can appear between the sentinels that open and close a doc part. I may do some experiments, but it looks too risky to mess with this part of the read code. Thus, at.inCode and at.docOut likely will remain forever. Conclusions The old read is subtly complex in ways that I did not fully appreciate when I considered using v.tempBodyList as an alternative to at.out and v.tempBodyString My guess at present is that the best long-term solution will be to eliminate both v.tempBodyString and at.out, but this plan carries significant risk. I'll try to minimize the risk by using a "unified" switch to mark where the *old* read code will change. In effect, this switch will mark where goal 1 is violated. It may be that a new suite of unit tests will be needed to test the code that reads thin sentinels. For all these reasons, it is essential that this work be done in a separate branch. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
