Issue Type: Bug Bug
Affects Versions: JRuby 1.7.4
Assignee: Unassigned
Components: Encoding
Created: 21/Aug/13 8:01 AM
Description:

The script below on JRuby outputs:

["a", "ASCII-8BIT"]
["aa", "US-ASCII"]
["☃", "UTF-8"]

and on MRI 1.9.3p392:

["a", "UTF-8"]
["aa", "UTF-8"]
["☃", "UTF-8"]

Which would be OK if it were just US-ASCII and UTF-8, except that one character strings are encoded as ASCII-8BIT! This is totally unexpected, and breaks code which quite reasonably would expect UTF-8 or 7 bit clean text.

--------------------------------

  1. coding: utf-8

require 'rexml/document'

["a", "aa", "☃"].each do |string|
doc = REXML::Document.new(%Q!<?xml version="1.0" encoding="UTF-8"?><string>#

{string}

</string>!)
decoded_string = doc.elements["string"].text
p [decoded_string, decoded_string.encoding.name]
end

Project: JRuby
Priority: Major Major
Reporter: Ben Summers
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
--------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email

Reply via email to