Yonik Seeley wrote:
On 12/6/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Also I'd be curious to see a problem with Unicode code points in XML,
if you have one handy.
The definition of valid XML 1.0 characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
The simplest example is code-point 0. It's a valid unicode character,
but it's not a valid XML character (even when you replace it with an
entity).
Example: <tag>NullTerminated�</tag> is not valid XML
Are you aware, though, of an existing Unicode serialization/markup
mechanism without XML's gaps?
I'm confident that XML can accommodate our needs just fine, and any
other text transmission would have to re-solve many issues that XML
has already solved.
Agreed. It wasn't a blocker, but it was something I wanted to see
tackled up front. It means adding a little more application logic to
handle escaping/unescaping.
The bottom line is I want to be able to represent the perfectly valid
lucene query new TermQuery(new Term("field","\u0000")).
Base64 is frequently used as an escape mechanism for binary data in XML.
It has the nice property that it can be used directly as XML character
data, since its standard representation does not use any XML metacharacters.
One possible solution to the escaping issue is a standard optional
attribute named "encoding", the value of which could be extensible, with
value "base64" built into the initial implementation. Then, unless the
attribute is present, all data is taken literally. E.g. (taking Yonik's
example 'TermQuery(new Term("field","\u0000"))'):
<TermQuery>
<Term field="field" encoding="base64">AA==</Term>
</TermQuery>
Note that this solution would limit the serialization syntax, though,
because unless there is a single attribute name for possibly-escaped
data (very unlikely, methinks), escapable text would only be
representable as text node children of elements, and *not* as attribute
values.
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]