Yonik wrote:
For normal text data, with valid unicode characters that aren't legal
XML, I'd rather have a simple escaping mechanism. Something like
backslash escaping that is easily understood. Maybe something as
simple as \00 for � (backslash followed by two hex digits).
I agree with your goal of transparency, especially for the cases of
human authorship.
However, I don't agree with the idea of an application-specific escape
syntax. What if someone wants to use the query metacharacter(s) ('\' in
your example) literally? The usual answer is to escape the
metacharacters, e.g. "\\00" to encode literal "\00". But *especially*
for the human-authored cases, introduction of this complexity is less
than ideal.
An alternative mechanism could be empty XML elements, e.g.:
<Term field="field"><UnicodeCharacter hex="00"/></Term>
Or less verbosely, with a fixed set of element names (and there are 28
of these, right?: [#x00-#x08] | #x0B | #x0C | [#x0E-#x1F]):
<TermQuery>
<Term field="field"><Char00/></Term>
</TermQuery>
-Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]