Re: Any leo plugin that can import HTML code into a tree?

Edward K. Ream Sun, 06 Nov 2011 07:54:14 -0800

On Nov 2, 4:06 pm, "Edward K. Ream" <[email protected]> wrote:


> 2. As Terry has just corrected me, the difference between one space
> (or tab) and many is not significant, yet the difference between one
> and none is significant.  A redesign of the importer (tokens) may be
> needed.

I bear tidings of great joy :-)

Sometimes reading the actual language spec is the shortest distance
between two points.  That is never truer than reading the w3.org
documents.  They are simply superb.

The question is, "exactly what *is* whitespace in html/xml documents,
and how must it be treated?"

The first stop:  http://www.w3.org/TR/html401/struct/text.html#h-9.1

QQQQQ
9.1 White space

The document character set includes a wide variety of white space
characters. Many of these are typographic elements used in some
applications to produce particular visual spacing effects. In HTML,
only the following characters are defined as white space characters:

    ASCII space (&#x0020;)
    ASCII tab (&#x0009;)
    ASCII form feed (&#x000C;)
    Zero-width space (&#x200B;)

Line breaks are also white space characters. Note that although
&#x2028; and &#x2029; are defined in [ISO10646] to unambiguously
separate lines and paragraphs, respectively, these do not constitute
line breaks in HTML, nor does this specification include them in the
more general category of white space characters.
QQQQQ

Following the "Line breaks" link at the start of the last paragraph
yields: http://www.w3.org/TR/html401/appendix/notes.html#notes-line-breaks

QQQQQ
B.3.1 Line breaks

SGML (see [ISO8879], section 7.6.1) specifies that a line break
immediately following a start tag must be ignored, as must a line
break immediately before an end tag. This applies to all HTML elements
without exception.

The following two HTML examples must be rendered identically:

<P>Thomas is watching TV.</P>

<P>
Thomas is watching TV.
</P>

So must the following two examples:

<A>My favorite Website</A>

<A>
My favorite Website
</A>
QQQQQ

Eureka!  This is just one example of many of the deep thought that has
gone into the w3 standards.  This is *precisely* what Leo needs in
order to be able to put html elements in different nodes!

Now I am particularly glad that Leo's importers have moved to token-
based comparisons.  It will take some more work to be able to detect
newlines after opening tags and before closing tags, but that doesn't
matter.  Leo's importers now have the "right stuff" to do so.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Re: Any leo plugin that can import HTML code into a tree?

Reply via email to