Hi,
While I was testing JSR223 API, I hit JRuby's i18n bug. After fixing
problems in JSR223's JRubyScriptEngine, I tried to print Japanese
characters. This ended up in Mojibake. Supposed bug resides in
org.jruby.astStrNode and org.jruby.util.ByteList, both of them handle
characters as byte array. ByteList.append(int) method invoked from
StringTerm.parseStringIntoBuffer method casts 16 bits char to 8 bits
byte. As a result, this method drops 8btis and causes Mojibake. Like
other mutibyte languages, Japanese characters needs 16bits to express
a single character, so byte type, which has only 8bits, is short for
multibyte languages. In terms of i18n, StrNode should have a char
based buffer rather than a ByteList type.
But, if StrNode needs to keep a byte array, the i18n problem should be
fixed by converting character correctly to byte array as in this code.
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.util.logging.Level;
import java.util.logging.Logger;
public class CharacterTest {
public static void main(String[] args) throws UnsupportedEncodingException {
char[] hello = {'こ', 'ん', 'に', 'ち', 'は'}; //Japanese
characters are here.
String defaultEncodingName = System.getProperty("sun.jnu.encoding");
for (char c : hello) {
byte[] bytes = getByteArrayFromChar(c, defaultEncodingName);
System.out.print(new String(bytes, defaultEncodingName));
}
}
private static byte[] getByteArrayFromChar(char c, String encodingName) {
try {
CharsetEncoder encoder = Charset.forName(encodingName).newEncoder();
CharBuffer cbuf = CharBuffer.allocate(1);
cbuf.put(c);
cbuf.flip();
ByteBuffer buf = encoder.encode(cbuf);
int nbytes = buf.limit();
byte[] encodedBytes = new byte[nbytes];
buf.get(encodedBytes);
return encodedBytes;
} catch (CharacterCodingException ex) {
Logger.getLogger(CharacterTest3.class.getName()).log(Level.SEVERE,
null, ex);
}
return null;
}
}
-Yoko
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email