On Oct 28, 3:49 am, zpcspm <[email protected]> wrote: > Sometimes I have to look at badly aligned HTML code and keep asking > myself questions like "here's the opening tag, I wonder where it > closes".
A progress report. I have spent several hours on this. It's an interesting problem, for several reasons: 1. html is more free-form than almost any other language, certainly more free-form than other programming languages. As a result, weirdly- formatted html code is common. In particular, underindented code is much more common than usual. 2. As Terry has just corrected me, the difference between one space (or tab) and many is not significant, yet the difference between one and none is significant. A redesign of the importer (tokens) may be needed. 3. The user may specify what kinds of tags are to form nodes. This actually causes few problems. 4. html often will contain javascript code that has its own syntax rules. I'm not sure this makes any difference, actually. The importers will usually preserve meaning whatever nodes get generated, but verifying that fact is a can of worms. The data.html file contains several examples of code that gives the importers difficulties. I've got a few unit tests that fail, which is the first step :-) I expect several more hours of work at least will be needed to straighten everything else. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/leo-editor?hl=en.
