On Oct 28, 3:49 am, zpcspm <[email protected]> wrote:
> Sometimes I have to look at badly aligned HTML code and keep asking
> myself questions like "here's the opening tag, I wonder where it
> closes".

A progress report.

I have spent several hours on this.  It's an interesting problem, for
several reasons:

1. html is more free-form than almost any other language, certainly
more free-form than other programming languages.  As a result, weirdly-
formatted html code is common.  In particular, underindented code is
much more common than usual.

2. As Terry has just corrected me, the difference between one space
(or tab) and many is not significant, yet the difference between one
and none is significant.  A redesign of the importer (tokens) may be
needed.

3. The user may specify what kinds of tags are to form nodes.  This
actually causes few problems.

4. html often will contain javascript code that has its own syntax
rules.  I'm not sure this makes any difference, actually.

The importers will usually preserve meaning whatever nodes get
generated, but verifying that fact is a can of worms.  The data.html
file contains several examples of code that gives the importers
difficulties.  I've got a few unit tests that fail, which is the first
step :-)  I expect several more hours of work at least will be needed
to straighten everything else.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Reply via email to