> So is it OK in XML to escape all other control characters > with the &#xx; ? That seemed to be what I understood from my googling.
This is only true for XML1.1. XML1.1 makes all control characters (except 0x0) restricted characters: [2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ [2a] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F] The use of the restricted characters is discouraged. see http://www.w3.org/TR/2006/REC-xml11-20060816/#charsets for more details. XML1.0 blocks a number of control characters: Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ See http://www.w3.org/TR/xml/#charsets I dislike option c (CDATA approach) because all implementations would need to implement their own CDATA parser (XML parser ignores the CDATA statements:). I don't think we would want to restrict the allowed character set in LTK (binary and XML) - option a. I am generally also in favaour of the Base64 encode approach because it is probably the cleanest solution. I just see the problem that this limits the readability and authoring by a human of the LTK XML file (one of its major use cases, right?) if control characters are heavily used. However, the only alternative I see is some LTK special encoding rule for 0x0 (e.g. \NULL) and using XML1.1: <rp:ReaderFirmwareVersion>3.0.1.240\NULL</rp:ReaderFirmwareVersion> - Christian > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On > Behalf Of John R. Hogerhuis > Sent: Freitag, 29. Februar 2008 13:29 > To: LLRP Toolkit Development List > Subject: Re: [ltk-d] Java LTK and non XML characters > > On Fri, Feb 29, 2008 at 9:55 AM, Christian Floerkemeier > <[EMAIL PROTECTED]> wrote: > > > > I agree, but escaping via � etc is not an option to my > knowledge. > > Control characters are illegal in XML, regardless of encoding. > > > > Yikes... did some research based on your comment and I agree. > Seems that the XML folks want to be nice to the C programmers > too. OK, then in that case I think we need to do one of: > > a) Ban null from utf8's in LTK > b) When they appear (which is, hopefully never), escape the entire > utf8 as a hex string or base64. We could put an attribute on > the element in the LTK-XML instance to indicate that we are > representing the string as xs:hexBinary. > c) CDATA as you propose. > > CDATA has its own problems and complexities. If we don't do > (a) I think I would prefer a simple hex encoding in the rare > case that a NULL appears since the XML parser will work with > it just fine. > > <rp:ReaderFirmwareVersion > binencode="hex">332E302E312E323400</rp:ReaderFirmwareVersion> > > The default encoding is raw utf-8. > > So is it OK in XML to escape all other control characters > with the &#xx; ? That seemed to be what I understood from my googling. > > -- John. > > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Microsoft Defy all > challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > llrp-toolkit-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/llrp-toolkit-devel > ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ llrp-toolkit-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/llrp-toolkit-devel
