Thomas Broyer wrote:
> For those of you who, like me, don't read html5lib-commits, I've
> started a branch (treewalking-serialization) to experiment this new
> "treewalking way".
I've been watching it with interest
> 2007/6/12, Sam Ruby:
>>> I know there are precedents: SAX first, but also xmlpull, python's
>>> xml.dom.pulldom and .NET's XMLReader.
>> Those aren't so much filters as sources.
>
> What about Python's xml.sax.saxutils.XMLFilterBase and Java's
> org.xml.sax.XMLFilter?
>
> And you can also implement a .NET XmlReader which filters another
> XmlReader, or an Java XMLPull filtering another XMLPull, etc.
>
>> Why can't genshistream or pulldom simply be a tokenizer?
>
> They can, but then they would be called "adaptors", not "tree walkers"
> (because they're already streams, not trees).
Semantics. I'm simply suggesting that both "tree walkers" and
"adaptors" are subclasses of "sources", and any code that expects a
stream of tokens should be able to accept either.
> And if you want to serialize them as HTML (or XHTML) and be assured
> the output is wellformed, you must trust that the stream itself is
> kind of "wellformed", and who can trust the stream when it could have
> been filter multiple times?
For what I am about to say, realize that both intertwingly.net/blog and
planet.intertwingly.net are served as application/xhtml+xml...
While I believe that well-formedness is valuable to many, it is not
without cost, and I don't believe that those for whom well-formedness is
a requirement should have to bear that cost.
In concrete terms, I have been looking at the filters that you have been
creating, and they are more complex than the ones on the trunk.
>> Part of the premise of HTML5 is that the general case of building a
>> well-formed result from a stream of tokens requires building a tree,
>> complete with adoption agency algorithms and vodoo modes.
>
> Hence my proposal to use true treewalkers (with wellformedness
> guarantee) for serialization rather than streams of tokens.
> Are you saying that it's the programmers responsibility to build a
> tree from a stream of tokens and then use a treewalker which he
> trusts for serialization? If the source is already a treewalker,
> you're building a copy of the tree, just because you don(t trust the
> output.
> With my proposal, you don't have to trust the input, because the
> serializer is the "main controller" (see my first experiments in
> serializer.py and treewalkers/_base.py in the
> treewalking-serialization branch)
"well formedness" is just one thing you may have to trust in the input.
As a concrete example, a well-formed stream may not have a <head>
element. This will affect the correct operation of the
inject-meta-charset filter.
But as I said, I am watching your branch with interest. I've committed
a test (to the trunk) that I would appreciate seeing how it could be
handled in the branch. In essence, I've testing for two requirements:
1) if a meta charset tag exists, replace the encoding, but leave the
tag where it is.
2) if no meta charset tag exists, add one at the beginning of the head
For purposes of this test, let's assume that a head element exists in
the stream, the stream is well formed, and that re-ordering elements in
the head isn't a good idea.
- Sam Ruby
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"html5lib-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---