Charles Oliver Nutter created JRUBY-6403:
--------------------------------------------

             Summary: Regexp + encoding errors in REXML
                 Key: JRUBY-6403
                 URL: https://jira.codehaus.org/browse/JRUBY-6403
             Project: JRuby
          Issue Type: Bug
          Components: Core Classes/Modules
            Reporter: Charles Oliver Nutter


The attached script produces encoding mismatch errors from regexp. It also 
produces an error when it tries to eventually construct the exception, since 
the contents of the message are incorrectly encoded. I had to add some logging 
to rexml's parseexception.rb to get the actual errors to print out:

{noformat}
diff --git a/lib/ruby/1.9/rexml/parseexception.rb 
b/lib/ruby/1.9/rexml/parseexception.rb
index 0c4d55a..9a2d885 100644
--- a/lib/ruby/1.9/rexml/parseexception.rb
+++ b/lib/ruby/1.9/rexml/parseexception.rb
@@ -21,6 +21,11 @@ module REXML
       end
 
       # Get the stack trace and error message
+      puts err
+      p err.encoding
+      s = super
+      puts s
+      p s.encoding
       err << super
 
       # Add contextual information
{noformat}

It seems we're still having some encoding mismatch problems.

My full output (with extra logging) follows:

{noformat}
UTF-8
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 
regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in 
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in 
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in 
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
 `parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
 `xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
 `xml_in'
test.rb:13:in `(root)'
...
#<Encoding:UTF-8>
Exception parsing
#<Encoding:US-ASCII>
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 
regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in 
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in 
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in 
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
 `parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
 `xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
 `xml_in'
test.rb:13:in `(root)'
...
#<Encoding:UTF-8>
Exception parsing
#<Encoding:US-ASCII>
#<REXML::ParseException: #<Encoding::CompatibilityError: incompatible encoding 
regexp match (UTF-8 regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in 
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in 
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in 
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
 `parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
 `xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
 `xml_in'
test.rb:13:in `(root)'
...
Exception parsing
Line: 4
Position: 94
Last 80 unconsumed characters:
     <!-- Savi žemÄ&#151;s unitai -->>
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:427:in 
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in 
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in 
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
 `parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
 `xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
 `xml_in'
test.rb:13:in `(root)'
...
#<Encoding:UTF-8>
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 
regexp with ASCII-8BIT string)>
org/jruby/RubyRegexp.java:1504:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in 
`pull_event'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in 
`pull'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in 
`parse'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build'
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in
 `parse'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in
 `xml_in'
/Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in
 `xml_in'
test.rb:13:in `(root)'
...
Exception parsing
Line: 4
Position: 94
Last 80 unconsumed characters:
     <!-- Savi žem&#279;s unitai -->
#<Encoding:ASCII-8BIT>
Encoding::CompatibilityError: incompatible character encodings: UTF-8 and 
ASCII-8BIT
   concat at org/jruby/RubyString.java:2521
     to_s at 
/Users/headius/projects/jruby/lib/ruby/1.9/rexml/parseexception.rb:29
  message at org/jruby/RubyException.java:266
{noformat}

Note that the last error listed is the one from attempting to append the 
superclass exception's to_s result to the current error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to