Yonik wrote:
For normal text data, with valid unicode characters that aren't legal
XML, I'd rather have a simple escaping mechanism.  Something like
backslash escaping that is easily understood.  Maybe something as
simple as \00 for � (backslash followed by two hex digits).

I agree with your goal of transparency, especially for the cases of human authorship.

However, I don't agree with the idea of an application-specific escape syntax. What if someone wants to use the query metacharacter(s) ('\' in your example) literally? The usual answer is to escape the metacharacters, e.g. "\\00" to encode literal "\00". But *especially* for the human-authored cases, introduction of this complexity is less than ideal.

An alternative mechanism could be empty XML elements, e.g.:

<Term field="field"><UnicodeCharacter hex="00"/></Term>

Or less verbosely, with a fixed set of element names (and there are 28 of these, right?: [#x00-#x08] | #x0B | #x0C | [#x0E-#x1F]):

<TermQuery>
  <Term field="field"><Char00/></Term>
</TermQuery>

-Steve

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to