Re: [jruby-dev] I18n problem in StrNode and ByteList

Yoko Harada Fri, 26 Oct 2007 13:42:13 -0700

It seems that JSR 223 implementation made an unwanted path to parser.
JSR223's JRubyScriptEngine gives encoded, not RAW string to parser.
Sounds like a JRubyScriptEngine's problem, again.


-Yoko

On 10/26/07, Thomas E Enebo <[EMAIL PROTECTED]> wrote:
> Incoming string to parser should be treated as RAW or ISO8859_1 so
> that every single byte still gets read.  The parser will not properly
> read chars since Ruby only works at a byte level.  Did that make
> sense?
>
> -Tom
>
> On 10/26/07, Yoko Harada <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > While I was testing JSR223 API, I hit JRuby's i18n bug. After fixing
> > problems in JSR223's JRubyScriptEngine, I tried to print Japanese
> > characters. This ended up in Mojibake. Supposed bug resides in
> > org.jruby.astStrNode and org.jruby.util.ByteList, both of them handle
> > characters as byte array. ByteList.append(int) method invoked from
> > StringTerm.parseStringIntoBuffer method casts 16 bits char to 8 bits
> > byte. As a result, this method drops 8btis and causes Mojibake. Like
> > other mutibyte languages, Japanese characters needs 16bits to express
> > a single character, so byte type, which has only 8bits, is short for
> > multibyte languages. In terms of i18n, StrNode should have a char
> > based buffer rather than a ByteList type.
> >
> > But, if StrNode needs to keep a byte array, the i18n problem should be
> > fixed by converting character correctly to byte array as in this code.
> >
> > import java.io.UnsupportedEncodingException;
> > import java.nio.ByteBuffer;
> > import java.nio.CharBuffer;
> > import java.nio.charset.CharacterCodingException;
> > import java.nio.charset.Charset;
> > import java.nio.charset.CharsetEncoder;
> > import java.util.logging.Level;
> > import java.util.logging.Logger;
> >
> > public class CharacterTest {
> >
> >     public static void main(String[] args) throws 
> > UnsupportedEncodingException {
> >         char[] hello = {'こ', 'ん', 'に', 'ち', 'は'}; //Japanese
> > characters are here.
> >         String defaultEncodingName = System.getProperty("sun.jnu.encoding");
> >         for (char c : hello) {
> >             byte[] bytes = getByteArrayFromChar(c, defaultEncodingName);
> >             System.out.print(new String(bytes, defaultEncodingName));
> >         }
> >     }
> >
> >     private static byte[] getByteArrayFromChar(char c, String encodingName) 
> > {
> >         try {
> >             CharsetEncoder encoder = 
> > Charset.forName(encodingName).newEncoder();
> >             CharBuffer cbuf = CharBuffer.allocate(1);
> >             cbuf.put(c);
> >             cbuf.flip();
> >             ByteBuffer buf = encoder.encode(cbuf);
> >             int nbytes = buf.limit();
> >             byte[] encodedBytes = new byte[nbytes];
> >             buf.get(encodedBytes);
> >             return encodedBytes;
> >         } catch (CharacterCodingException ex) {
> >             
> > Logger.getLogger(CharacterTest3.class.getName()).log(Level.SEVERE,
> > null, ex);
> >         }
> >         return null;
> >     }
> > }
> >
> > -Yoko
> >
> > ---------------------------------------------------------------------
> > To unsubscribe from this list please visit:
> >
> >     http://xircles.codehaus.org/manage_email
> >
> >
>
>
> --
> Blog: http://www.bloglines.com/blog/ThomasEEnebo
> Email: [EMAIL PROTECTED] , [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe from this list please visit:
>
>     http://xircles.codehaus.org/manage_email
>
>

---------------------------------------------------------------------
To unsubscribe from this list please visit:

    http://xircles.codehaus.org/manage_email

Re: [jruby-dev] I18n problem in StrNode and ByteList

Reply via email to