Re: [jruby-dev] I18n problem in StrNode and ByteList

Thomas E Enebo Fri, 26 Oct 2007 13:20:11 -0700

Incoming string to parser should be treated as RAW or ISO8859_1 so
that every single byte still gets read.  The parser will not properly
read chars since Ruby only works at a byte level.  Did that make
sense?


-Tom

On 10/26/07, Yoko Harada <[EMAIL PROTECTED]> wrote:
> Hi,
>
> While I was testing JSR223 API, I hit JRuby's i18n bug. After fixing
> problems in JSR223's JRubyScriptEngine, I tried to print Japanese
> characters. This ended up in Mojibake. Supposed bug resides in
> org.jruby.astStrNode and org.jruby.util.ByteList, both of them handle
> characters as byte array. ByteList.append(int) method invoked from
> StringTerm.parseStringIntoBuffer method casts 16 bits char to 8 bits
> byte. As a result, this method drops 8btis and causes Mojibake. Like
> other mutibyte languages, Japanese characters needs 16bits to express
> a single character, so byte type, which has only 8bits, is short for
> multibyte languages. In terms of i18n, StrNode should have a char
> based buffer rather than a ByteList type.
>
> But, if StrNode needs to keep a byte array, the i18n problem should be
> fixed by converting character correctly to byte array as in this code.
>
> import java.io.UnsupportedEncodingException;
> import java.nio.ByteBuffer;
> import java.nio.CharBuffer;
> import java.nio.charset.CharacterCodingException;
> import java.nio.charset.Charset;
> import java.nio.charset.CharsetEncoder;
> import java.util.logging.Level;
> import java.util.logging.Logger;
>
> public class CharacterTest {
>
>     public static void main(String[] args) throws 
> UnsupportedEncodingException {
>         char[] hello = {'こ', 'ん', 'に', 'ち', 'は'}; //Japanese
> characters are here.
>         String defaultEncodingName = System.getProperty("sun.jnu.encoding");
>         for (char c : hello) {
>             byte[] bytes = getByteArrayFromChar(c, defaultEncodingName);
>             System.out.print(new String(bytes, defaultEncodingName));
>         }
>     }
>
>     private static byte[] getByteArrayFromChar(char c, String encodingName) {
>         try {
>             CharsetEncoder encoder = 
> Charset.forName(encodingName).newEncoder();
>             CharBuffer cbuf = CharBuffer.allocate(1);
>             cbuf.put(c);
>             cbuf.flip();
>             ByteBuffer buf = encoder.encode(cbuf);
>             int nbytes = buf.limit();
>             byte[] encodedBytes = new byte[nbytes];
>             buf.get(encodedBytes);
>             return encodedBytes;
>         } catch (CharacterCodingException ex) {
>             Logger.getLogger(CharacterTest3.class.getName()).log(Level.SEVERE,
> null, ex);
>         }
>         return null;
>     }
> }
>
> -Yoko
>
> ---------------------------------------------------------------------
> To unsubscribe from this list please visit:
>
>     http://xircles.codehaus.org/manage_email
>
>


-- 
Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: [EMAIL PROTECTED] , [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe from this list please visit:

    http://xircles.codehaus.org/manage_email

Re: [jruby-dev] I18n problem in StrNode and ByteList

Reply via email to