ENB: Rewriting the xml and html importers

Edward K. Ream Tue, 22 Nov 2016 05:32:16 -0800

This is an engineering notebook post.  Feel free to ignore, even if you are 
a dev ;-)


The html importer has been, and always will be, a trivial subclass of the 
xml importer. As a result, we need only consider the xml importer.

*The bad old days*

The old xml importer is a perfect example of what was wrong with the old 
importers. It is horrendously complex, with the complexity having 
everything to do with its base class, and nothing to do with the xml 
language itself!

Reviewing the old code, I noticed several bug fixes.  Happily, it looks 
like all those bugs are covered with unit tests, so it won't be possible to 
reintroduce those bugs.

*Strategy*

xml and html use neither brackets nor indentation to delimit structure.  
Otoh, open tags are kind like open brackets.  Ditto for close tags. This 
similarity means we *can* use i.v2_gen_lines, but the standard 
i.v2_scan_lines won't work.

There are various faux-clever ways to proceed, but by far the simplest is 
simply to rewrite i.v2_scan_line so it *doesn't* use a table.  Instead, it 
will be somewhat like js_i.v2_scan_line, used by the javascript importer.

There will be no Xml_ScanState.update method.  xml_i.v2_scan_line will 
update xml_state.tag_level/context directly.  The Xml_ScanState ctor will 
not follow the standard protocol.

That's about it.

EKR

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/leo-editor.
For more options, visit https://groups.google.com/d/optout.

ENB: Rewriting the xml and html importers

Reply via email to