UTF-8 chacters don't pass through hpricot gracefully since Jruby 1.1.6
----------------------------------------------------------------------

                 Key: JRUBY-3732
                 URL: http://jira.codehaus.org/browse/JRUBY-3732
             Project: JRuby
          Issue Type: Bug
    Affects Versions: JRuby 1.3, JRuby 1.2
         Environment: Linux x64  
jruby 1.3.0 (ruby 1.8.6p287) (2009-06-03 5dc2e22) (Java HotSpot(TM) 64-Bit 
Server VM 1.6.0_02) [amd64-java]
jruby 1.2.0 (ruby 1.8.6 patchlevel 287) (2009-03-16 rev 9419) [amd64-java]
jruby 1.1.6 (ruby 1.8.6 patchlevel 114) (2008-12-17 rev 8388) [amd64-java]
*** LOCAL GEMS ***
hpricot (0.6.164)


            Reporter: David Kellum
            Assignee: Thomas E Enebo


UTF-8 characters no longer pass gracefully through hpricot (after jruby 1.1.6)

The following code sample, tested with UTF-8 encoding, has input string 
containing unicode mdash:

{code:ruby}
require 'rubygems'
require 'hpricot'

input = "<p>TUCSON, Ariz. — The driver</p>"
puts input

doc = Hpricot.parse( input )

puts doc.inner_html
{code:ruby}

Here is comparative output:

{code}
% ruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. — The driver</p>
<p>TUCSON, Ariz. — The driver</p>
david)  /opt/dist/jruby-1.1.6/bin/jruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. — The driver</p>
<p>TUCSON, Ariz. — The driver</p>
%  /opt/dist/jruby-1.2.0/bin/jruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. — The driver</p>
<p>TUCSON, Ariz. â&#128;&#148; The driver</p>
%  /opt/dist/jruby-1.3.0/bin/jruby ./utf8_sample_2.rb
<p>TUCSON, Ariz. — The driver</p>
<p>TUCSON, Ariz. â&#128;&#148; The driver</p>
{code}

Where jruby 1.2.0 and 1.3.0 show a mangled mdash (â&#128;&#148;).



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to