Re: About the serializer, filters and tree walkers...

Sam Ruby Wed, 13 Jun 2007 17:19:14 -0700

Thomas Broyer wrote:
> For those of you who, like me, don't read html5lib-commits, I've
> started a branch (treewalking-serialization) to experiment this new
> "treewalking way".


I've been watching it with interest

> 2007/6/12, Sam Ruby:
>>> I know there are precedents: SAX first, but also xmlpull, python's
>>> xml.dom.pulldom and .NET's XMLReader.
>> Those aren't so much filters as sources.
> 
> What about Python's xml.sax.saxutils.XMLFilterBase and Java's
> org.xml.sax.XMLFilter?
> 
> And you can also implement a .NET XmlReader which filters another
> XmlReader, or an Java XMLPull filtering another XMLPull, etc.
> 
>> Why can't genshistream or pulldom simply be a tokenizer?
> 
> They can, but then they would be called "adaptors", not "tree walkers"
> (because they're already streams, not trees).

Semantics.  I'm simply suggesting that both "tree walkers" and 
"adaptors" are subclasses of "sources", and any code that expects a 
stream of tokens should be able to accept either.

> And if you want to serialize them as HTML (or XHTML) and be assured
> the output is wellformed, you must trust that the stream itself is
> kind of "wellformed", and who can trust the stream when it could have
> been filter multiple times?

For what I am about to say, realize that both intertwingly.net/blog and 
planet.intertwingly.net are served as application/xhtml+xml...

While I believe that well-formedness is valuable to many, it is not 
without cost, and I don't believe that those for whom well-formedness is 
a requirement should have to bear that cost.

In concrete terms, I have been looking at the filters that you have been 
creating, and they are more complex than the ones on the trunk.

>> Part of the premise of HTML5 is that the general case of building a
>> well-formed result from a stream of tokens requires building a tree,
>> complete with adoption agency algorithms and vodoo modes.
> 
> Hence my proposal to use true treewalkers (with wellformedness
> guarantee) for serialization rather than streams of tokens.
> Are you saying that it's the programmers responsibility to build a
> tree from a stream of tokens  and then use a treewalker which he
> trusts for serialization? If the source is already a treewalker,
> you're building a copy of the tree, just because you don(t trust the
> output.
> With my proposal, you don't have to trust the input, because the
> serializer is the "main controller" (see my first experiments in
> serializer.py and treewalkers/_base.py in the
> treewalking-serialization branch)

"well formedness" is just one thing you may have to trust in the input. 
  As a concrete example, a well-formed stream may not have a <head> 
element.  This will affect the correct operation of the 
inject-meta-charset filter.

But as I said, I am watching your branch with interest.  I've committed 
a test (to the trunk) that I would appreciate seeing how it could be 
handled in the branch.  In essence, I've testing for two requirements:

  1) if a meta charset tag exists, replace the encoding, but leave the
     tag where it is.
  2) if no meta charset tag exists, add one at the beginning of the head

For purposes of this test, let's assume that a head element exists in 
the stream, the stream is well formed, and that re-ordering elements in 
the head isn't a good idea.

- Sam Ruby



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
 To post to this group, send email to [email protected]
 To unsubscribe from this group, send email to [EMAIL PROTECTED]
 For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB
-~----------~----~----~----~------~----~------~--~---

Re: About the serializer, filters and tree walkers...

Reply via email to