String#unpack with U pattern is broken with multi-byte sequences
----------------------------------------------------------------
Key: JRUBY-1788
URL: http://jira.codehaus.org/browse/JRUBY-1788
Project: JRuby
Issue Type: Bug
Environment: Latest JRuby 1.1b1
Reporter: Vladimir Sizikov
Assignee: Thomas E Enebo
Attachments: string-unpack-U.patch
Consider the following example:
{noformat}
"\xC2\x80".unpack("U")
{noformat}
MRI returns [128].
JRuby returns:
ArgumentError: malformed UTF-8 character
In fact, JRuby rejects almost anything with multi-byte UTF8 sequences.
This also leads to one Rubinius spec failure.
The proposed patch fixes the problem
The patch is a faithful conversion of MRI's algorithm to Java.
(Also thanks to Marcin for valuable discussions on the subject! :) )
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email