I found the W3C spec for binary XML:

http://www.w3.org/TR/wbxml/

I've only spent a few minutes skimming it; here are my findings:

* It is not hardwired to any particular DTD. It can be used for any XML document and preserves the full semantics of XML.
* Most tag and attribute names get tokenized to single bytes. A set of token IDs can be defined for a particular DTD to avoid having to define them in the token table in every document. This clearly offers very high compression.
* It's definitely possible for a particular document to include its own string table to define additional tokens.
* It appears possible to define tokens inline, which would allow you to use a particular tag or attribute name without having to predeclare it at the start of the stream (but since the name has to appear inline every time it's used, you don't save any space.)

I think this is definitely worth considering for Jabber. It should allow us to make the stream data much, much smaller and considerably simplify parsing.

�Jens

Reply via email to