Here, I'll be thinking out loud about how to make code generation work when using the new-style importers.
This is an engineering notebook post. Feel free to ignore. importers/python.py contains both the old and new python importers. The new_scanner switch enables the new importer. The new python importer fails miserably when new_scanner is True. Everything works as before (including all unit tests) when new_scanner is False. *Background* In another thread <https://groups.google.com/forum/#!topic/leo-editor/RDi2jffWjzI> I wrote If we were to convert the Python importer to use the new scheme, the entire > ScanState class would have to be rewritten. The reason should be > clear--Python uses indentation levels to indicate structure, not curly > brackets. > Rev 9755cf introduces the PythonScanState class. It also moves the scan_block method out of the ScanState class and into the BaseLineScanner (BLS) class where it belongs. The PythonScanState class is surprisingly simple. In particular, it handles backspace-newlines more simply than does the old-style importer. This is tricky to get exactly right. Happily, rewriting the ScanState class is *all* that would be required. > The BLS class would remain completely unchanged, and the importer would be > just as simple as the perl and javascript importers. > This statement was wildly optimistic. It has gradually dawned on me that there are serious problems with the code generation in the BLS class. *Code Generation* Code generation for javascript is easier than for python because nodes may contain multiple section references. For the python (and perl) importers, only one @others directive is allowed per node. This has important implications. The entire algorithm for breaking the input file into nodes may have to be revised. As a practical matter, I have found the block scanning and rescanning code to be almost impossible to understand. This is surprising, but not distressing. The algorithm was always going to be complex. I have derided the old-style importers as way too complicated. I may have to revise that assessment :-) The great advantage of the old-style code generators is that they handle indentation correctly in *all* situations. In particular, they handle underindented python *comment* lines properly. Such comments do *not* terminate defs or classes. I am willing to add extra indentation for such lines (with a warning), but even doing that has repercussions throughout the code. I plan to study the old code generators today, to remind myself how they work. But before doing that, let's see what the code generators *must* do. In fact, the answer is relatively straightforward. Each generated node, including the top-level node, will look like this: One or more *leading lines* @others, indented as discussed below zero or more * trailing lines*The top-level node will be @language python @others Nodes that have *no *children will consist only of the *properly indented *body of the class or def. This indentation depends on the *cumulative* indentation of all @others nodes in the node's parents. Nodes that *do* have children are the hard case. To repeat, they will look like: One or more leading lines @others, *properly *indented zero or more trailing lines There are three problem that must be solved completely: 1. Determining the leading lines. 2. Determining the indentation of the @others directive. 3. Determining the trailing lines. None of these tasks is trivial. Furthermore, the post pass may move lines around from the end of one block to the start of the next. Alas, this could affect the proper indentation of the @others directive! *The way forward* Clearly, the new-style code generators can do as well as the old code generators. In fact, the task of the new-style generators is *easier* than for the old-style code generators because the new code generators work on whole lines. In the worst case, the new importers can simply mirror the old code generators. Having said that, doing code generation the "old" way may require a complete rewrite of the code that allocates lines to nodes. Happily, adapting the old code generators to a line-oriented scheme must surely simplify them. *Summary* Code generation is much more challenging than I first imagined. The ScanState class is *not* the problem. It is a brilliant invention, if I do say so myself. It completely eliminates the need to parse the imported language. It will remain a foundation of the BLS class. Much of BLS class may have to be written, including BLS.scan and many of its helpers. The new code generators may be based on the old. No changes *whatever* will be tolerated in the old code generators. Instead, I'll copy any needed code from the BaseScanner class to the BLS class. Rewriting the old generators to work with the line-by-line scanner will simplify them. I relish such tasks. The BLS class is a fundamentally important part of Leo. It should be used for *all* of Leo's importers. It is worth *any* amount of work make the new importers as beautiful and accurate as possible. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.
