From: Jongjin Choi [mailto:[EMAIL PROTECTED] 
        Sent: Tuesday, December 28, 2004 11:56 AM
        To: [email protected]
        Subject: UTF8Encoder question...
        
        
        Dims and all, 
         
        UTF8Encoder writes escaped string when the character is over 0x7F. 
        The escaping does not seem to be necessary because 
        the Writer (not OutputStream) is used. 
         
        I think this could be just : (line 86)
         
        writer.write(character);
         
        instead of : (line 86 ~ 88)
        writer.write("&#x);
        writer.write(Integer.toHexString(character).toUpperCase());
        writer.write(";");
         
        The escaping just increases the message size.

Yes, it does. However, I think representing a character of which codepoint
is over 0x7F as a form of &#x XML entity is one of the aims of the encoder
because some systems can't display that character properly due to no
unicode-wide fonts built in there. In case it's 100% certain that every node
in a messaging system has no problem with "as-it-is" character
representation on a XML instance, it must be much more efficient to use a
compact encoder as you pointed out instead of UTF8Encoder. Interestingly,
AbstractXMLEncoder (which is not instantiable) works in such a way. In
consequence, it would be a good idea to create a new encoder to optimize
message size and use it with ease of configurability. (Yes, we can recommend
it to users dealing with non-Latin character systems :-)

Happy new year,

Ias

P.S. I'm going to switch [EMAIL PROTECTED] to [EMAIL PROTECTED] (soon,
very soon).

         
        If the OutputStream is used, the escaping or UTF-8 conversion (which
existed in old UTF8Encoder.java) will be needed.
         
        Thought?
         
        /Jongjin

Reply via email to