String#unpack with U pattern is broken with multi-byte sequences
----------------------------------------------------------------

                 Key: JRUBY-1788
                 URL: http://jira.codehaus.org/browse/JRUBY-1788
             Project: JRuby
          Issue Type: Bug
         Environment: Latest JRuby 1.1b1
            Reporter: Vladimir Sizikov
            Assignee: Thomas E Enebo
         Attachments: string-unpack-U.patch

Consider the following example:

{noformat}
"\xC2\x80".unpack("U")
{noformat}

MRI returns [128].

JRuby returns:
ArgumentError: malformed UTF-8 character

In fact, JRuby rejects almost anything with multi-byte UTF8 sequences.

This also leads to one Rubinius spec failure.

The proposed patch fixes the problem

The patch is a faithful conversion of MRI's algorithm to Java.
(Also thanks to Marcin for valuable discussions on the subject! :) )


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe from this list please visit:

    http://xircles.codehaus.org/manage_email

Reply via email to