On Nov 10, 6:37 am, "Edward K. Ream" <[email protected]> wrote:
> On Thu, Nov 10, 2011 at 12:42 AM, zpcspm <[email protected]> wrote:

> > It looks like leo still makes a difference between lowercase and
> > uppercase tags when importing HTML.

Yes, this was a bug and a bad one.  Rev 4771 fixes it.  It makes a big
difference.

> > - for some reason there's a "data declarations" node.

Also fixed at rev 4771.  Now the html/xml parsers never generate such
nodes.

The big advantage of the base class/subclass organization is that such
hacks are trivial to do.

> Last night I made progress on a better html token filter.  This will
> allow more accurate comparisons.

Rev 4771 uses the so-called "permissive" comparison in
ic.filterTokens.  This allows all unit tests to pass.

> In the middle of last night I had some other ideas for more robust importing.

These involve clever, or perhaps faux-clever ways of handling newline
tokens.  The 2-am Aha is that the xml/html standard allows *inserting*
newlines after opening tags and before closing tags, just at the
standard allows deleting such newlines.  This is a good trick, as it
doesn't collapse lines (which would hamper error reporting), but
rather increases lines, which should clarify error reporting.

Or so I think now.  The checking/reporting features of the import code
are extremely complex, in part because they must work silently when
unit tests pass, but must also be useful when doing actual imports.

In short, it may be a day or two, or possibly even more, before
data.html gets handled to my complete satisfaction.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Reply via email to