Incoming string to parser should be treated as RAW or ISO8859_1 so
that every single byte still gets read. The parser will not properly
read chars since Ruby only works at a byte level. Did that make
sense?
-Tom
On 10/26/07, Yoko Harada <[EMAIL PROTECTED]> wrote:
> Hi,
>
> While I was testing JSR223 API, I hit JRuby's i18n bug. After fixing
> problems in JSR223's JRubyScriptEngine, I tried to print Japanese
> characters. This ended up in Mojibake. Supposed bug resides in
> org.jruby.astStrNode and org.jruby.util.ByteList, both of them handle
> characters as byte array. ByteList.append(int) method invoked from
> StringTerm.parseStringIntoBuffer method casts 16 bits char to 8 bits
> byte. As a result, this method drops 8btis and causes Mojibake. Like
> other mutibyte languages, Japanese characters needs 16bits to express
> a single character, so byte type, which has only 8bits, is short for
> multibyte languages. In terms of i18n, StrNode should have a char
> based buffer rather than a ByteList type.
>
> But, if StrNode needs to keep a byte array, the i18n problem should be
> fixed by converting character correctly to byte array as in this code.
>
> import java.io.UnsupportedEncodingException;
> import java.nio.ByteBuffer;
> import java.nio.CharBuffer;
> import java.nio.charset.CharacterCodingException;
> import java.nio.charset.Charset;
> import java.nio.charset.CharsetEncoder;
> import java.util.logging.Level;
> import java.util.logging.Logger;
>
> public class CharacterTest {
>
> public static void main(String[] args) throws
> UnsupportedEncodingException {
> char[] hello = {'こ', 'ん', 'に', 'ち', 'は'}; //Japanese
> characters are here.
> String defaultEncodingName = System.getProperty("sun.jnu.encoding");
> for (char c : hello) {
> byte[] bytes = getByteArrayFromChar(c, defaultEncodingName);
> System.out.print(new String(bytes, defaultEncodingName));
> }
> }
>
> private static byte[] getByteArrayFromChar(char c, String encodingName) {
> try {
> CharsetEncoder encoder =
> Charset.forName(encodingName).newEncoder();
> CharBuffer cbuf = CharBuffer.allocate(1);
> cbuf.put(c);
> cbuf.flip();
> ByteBuffer buf = encoder.encode(cbuf);
> int nbytes = buf.limit();
> byte[] encodedBytes = new byte[nbytes];
> buf.get(encodedBytes);
> return encodedBytes;
> } catch (CharacterCodingException ex) {
> Logger.getLogger(CharacterTest3.class.getName()).log(Level.SEVERE,
> null, ex);
> }
> return null;
> }
> }
>
> -Yoko
>
> ---------------------------------------------------------------------
> To unsubscribe from this list please visit:
>
> http://xircles.codehaus.org/manage_email
>
>
--
Blog: http://www.bloglines.com/blog/ThomasEEnebo
Email: [EMAIL PROTECTED] , [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email