Re: importer proposal

Edward K. Ream Thu, 22 Sep 2016 04:30:43 -0700


On Wed, Sep 21, 2016 at 11:11 AM, 'tfer' via leo-editor <
leo-editor@googlegroups.com> wrote:


There are always request for new importers or changes to existing
> importers, few people other than Edward have wrapped their mind around how
> to do this for themselves.  This is a proposal to make this easier for
> people to create their own importers or easily customize any importer to
> create nodes at whatever level of detail they want.
>

This a very long reply, but I want to explain exactly why the present code
is as it is.

This would work by borrowing the Unix principles of making little programs
> that do one thing, then composing a chain of them to accomplish the task
> you want done.
>

I am aware of this design pattern.  The present code uses a different
pattern, namely having individual importers/writers override base importer
and writer classes.

The writer base class is relatively simple. Writing is much simpler than
reading.

The importer base class is complex for three distinct reasons:

1. Parsing is inherently a per-language process.  We could use a different
parsing tool for each language, but that will lead to duplicate code.  I
discuss parsing in greater detail below.

2. Given a proper parse of the language, splitting the code into separate
Leo nodes (what the importer calls code generation) is a tricky process.
We want to preserve line breaks and whitespace wherever possible.

3. We typically want to verify that the result of import will produce (when
written) the original import.  There is a separate (*extremely complex*)
phase that does this, based on several switches in the overridden importer
classes.

So yes, one could turn each of these areas of code into a separate process,
but the actual code would not change much.

Take a look at CScanner class in leo/plugins/importers/c.py.  It consists
only of a ctor that sets various switches.  All the real work is done in
leo/plugins/importers/basescanner.py.  There is *no way* to make the C
scanner simpler.

Imo, your proposal amounts to a request to refactor basescanner.py.
Perhaps that could be done, but I don't see any advantage to doing so.

 Each language would have its own default chain, but you would have the
> option of adding your own chains/parameters/programs in your files
> "settings" node for each language you want to override the defaults on.
>

That's exactly the situation at present.  Each importer uses settings to
modify the operation of basescanner.py, but importers are free to override
various methods as needed.  For example, see importers/python.py.

Yes, the BaseScanner.Parsing methods are hairy.  But there is a reason: the
parsing code, especially the scan and scanHelper must preserve line breaks
and whitespace.  Traditional parsers don't do this.  They simply create a
parse tree.

I have lots of experience with Python's AST parse trees.  Annotating the
tree to show comments and whitespace is a big hole in the Python API. I
describe the workaround in this stack overflow page
<http://stackoverflow.com/questions/7456933/python-ast-with-preserved-comments/36055400#36055400>.
As you can see, there are substantial difficulties involved.

But our problem is even harder: to create a *character-oriented* parser for
every imported language.  The present parsing code works, though it is ugly
behind the scenes.  Rather than using the base scanHelper method,
individual importers can override scanHelper.  That's a feasible approach,
and it may be that some importers actually do that, but it doesn't change
the overall situation.



> I imagine some of these tools would be general and then made specific to
> the language at hand by passing them things like a keyword list, regex's,
> and the like.  I'm not sure if this would need a intermediate format that
> gets turned into the nodes by a "nodeMaker" program at the end, (though
> this seems the likely scenario as the Unix approach is to keep things in
> text as long as possible), or if each program refines the parsing of nodes
> created by the previous program in the chain more directly.
>

The tools already exist.  They are all the methods in basescanner.py.


Python has support for this sort of thing in the "included batteries", the
> io module includes stuff to work with strings.  To make things look more
> Unix like their is pipetools:  https://pypi.python.org/pypi/pipetools.
>

To summarize, your proposal amounts to a request to redesign or refactor
basescanner.py.  I'm not going to do this, absent proof that the actual
problems to be solved are much easier than I know them to be :-)

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To post to this group, send email to leo-editor@googlegroups.com.
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

Re: importer proposal

Reply via email to