On Nov 10, 6:37 am, "Edward K. Ream" <[email protected]> wrote: > On Thu, Nov 10, 2011 at 12:42 AM, zpcspm <[email protected]> wrote:
> > It looks like leo still makes a difference between lowercase and > > uppercase tags when importing HTML. Yes, this was a bug and a bad one. Rev 4771 fixes it. It makes a big difference. > > - for some reason there's a "data declarations" node. Also fixed at rev 4771. Now the html/xml parsers never generate such nodes. The big advantage of the base class/subclass organization is that such hacks are trivial to do. > Last night I made progress on a better html token filter. This will > allow more accurate comparisons. Rev 4771 uses the so-called "permissive" comparison in ic.filterTokens. This allows all unit tests to pass. > In the middle of last night I had some other ideas for more robust importing. These involve clever, or perhaps faux-clever ways of handling newline tokens. The 2-am Aha is that the xml/html standard allows *inserting* newlines after opening tags and before closing tags, just at the standard allows deleting such newlines. This is a good trick, as it doesn't collapse lines (which would hamper error reporting), but rather increases lines, which should clarify error reporting. Or so I think now. The checking/reporting features of the import code are extremely complex, in part because they must work silently when unit tests pass, but must also be useful when doing actual imports. In short, it may be a day or two, or possibly even more, before data.html gets handled to my complete satisfaction. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
