Yonik Seeley wrote:
On 12/6/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Also I'd be curious to see a problem with Unicode code points in XML,
if you have one handy.

The definition of valid XML 1.0 characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

The simplest example is code-point 0.  It's a valid unicode character,
but it's not a valid XML character (even when you replace it with an
entity).
Example: <tag>NullTerminated&#0;</tag>  is not valid XML

Are you aware, though, of an existing Unicode serialization/markup mechanism without XML's gaps?

I'm confident that XML can accommodate our needs just fine, and any
other text transmission would have to re-solve many issues that XML
has already solved.

Agreed.  It wasn't a blocker, but it was something I wanted to see
tackled up front.  It means adding a little more application logic to
handle escaping/unescaping.

The bottom line is I want to be able to represent the perfectly valid
lucene query new TermQuery(new Term("field","\u0000")).

Base64 is frequently used as an escape mechanism for binary data in XML. It has the nice property that it can be used directly as XML character data, since its standard representation does not use any XML metacharacters.

One possible solution to the escaping issue is a standard optional attribute named "encoding", the value of which could be extensible, with value "base64" built into the initial implementation. Then, unless the attribute is present, all data is taken literally. E.g. (taking Yonik's example 'TermQuery(new Term("field","\u0000"))'):

<TermQuery>
  <Term field="field" encoding="base64">AA==</Term>
</TermQuery>

Note that this solution would limit the serialization syntax, though, because unless there is a single attribute name for possibly-escaped data (very unlikely, methinks), escapable text would only be representable as text node children of elements, and *not* as attribute values.

Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to