On Wed, Sep 21, 2016 at 11:11 AM, 'tfer' via leo-editor < leo-editor@googlegroups.com> wrote:
There are always request for new importers or changes to existing > importers, few people other than Edward have wrapped their mind around how > to do this for themselves. This is a proposal to make this easier for > people to create their own importers or easily customize any importer to > create nodes at whatever level of detail they want. > This a very long reply, but I want to explain exactly why the present code is as it is. This would work by borrowing the Unix principles of making little programs > that do one thing, then composing a chain of them to accomplish the task > you want done. > I am aware of this design pattern. The present code uses a different pattern, namely having individual importers/writers override base importer and writer classes. The writer base class is relatively simple. Writing is much simpler than reading. The importer base class is complex for three distinct reasons: 1. Parsing is inherently a per-language process. We could use a different parsing tool for each language, but that will lead to duplicate code. I discuss parsing in greater detail below. 2. Given a proper parse of the language, splitting the code into separate Leo nodes (what the importer calls code generation) is a tricky process. We want to preserve line breaks and whitespace wherever possible. 3. We typically want to verify that the result of import will produce (when written) the original import. There is a separate (*extremely complex*) phase that does this, based on several switches in the overridden importer classes. So yes, one could turn each of these areas of code into a separate process, but the actual code would not change much. Take a look at CScanner class in leo/plugins/importers/c.py. It consists only of a ctor that sets various switches. All the real work is done in leo/plugins/importers/basescanner.py. There is *no way* to make the C scanner simpler. Imo, your proposal amounts to a request to refactor basescanner.py. Perhaps that could be done, but I don't see any advantage to doing so. Each language would have its own default chain, but you would have the > option of adding your own chains/parameters/programs in your files > "settings" node for each language you want to override the defaults on. > That's exactly the situation at present. Each importer uses settings to modify the operation of basescanner.py, but importers are free to override various methods as needed. For example, see importers/python.py. Yes, the BaseScanner.Parsing methods are hairy. But there is a reason: the parsing code, especially the scan and scanHelper must preserve line breaks and whitespace. Traditional parsers don't do this. They simply create a parse tree. I have lots of experience with Python's AST parse trees. Annotating the tree to show comments and whitespace is a big hole in the Python API. I describe the workaround in this stack overflow page <http://stackoverflow.com/questions/7456933/python-ast-with-preserved-comments/36055400#36055400>. As you can see, there are substantial difficulties involved. But our problem is even harder: to create a *character-oriented* parser for every imported language. The present parsing code works, though it is ugly behind the scenes. Rather than using the base scanHelper method, individual importers can override scanHelper. That's a feasible approach, and it may be that some importers actually do that, but it doesn't change the overall situation. > I imagine some of these tools would be general and then made specific to > the language at hand by passing them things like a keyword list, regex's, > and the like. I'm not sure if this would need a intermediate format that > gets turned into the nodes by a "nodeMaker" program at the end, (though > this seems the likely scenario as the Unix approach is to keep things in > text as long as possible), or if each program refines the parsing of nodes > created by the previous program in the chain more directly. > The tools already exist. They are all the methods in basescanner.py. Python has support for this sort of thing in the "included batteries", the > io module includes stuff to work with strings. To make things look more > Unix like their is pipetools: https://pypi.python.org/pypi/pipetools. > To summarize, your proposal amounts to a request to redesign or refactor basescanner.py. I'm not going to do this, absent proof that the actual problems to be solved are much easier than I know them to be :-) Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to leo-editor+unsubscr...@googlegroups.com. To post to this group, send email to leo-editor@googlegroups.com. Visit this group at https://groups.google.com/group/leo-editor. For more options, visit https://groups.google.com/d/optout.