Charles Oliver Nutter created JRUBY-6403: --------------------------------------------
Summary: Regexp + encoding errors in REXML Key: JRUBY-6403 URL: https://jira.codehaus.org/browse/JRUBY-6403 Project: JRuby Issue Type: Bug Components: Core Classes/Modules Reporter: Charles Oliver Nutter The attached script produces encoding mismatch errors from regexp. It also produces an error when it tries to eventually construct the exception, since the contents of the message are incorrectly encoded. I had to add some logging to rexml's parseexception.rb to get the actual errors to print out: {noformat} diff --git a/lib/ruby/1.9/rexml/parseexception.rb b/lib/ruby/1.9/rexml/parseexception.rb index 0c4d55a..9a2d885 100644 --- a/lib/ruby/1.9/rexml/parseexception.rb +++ b/lib/ruby/1.9/rexml/parseexception.rb @@ -21,6 +21,11 @@ module REXML end # Get the stack trace and error message + puts err + p err.encoding + s = super + puts s + p s.encoding err << super # Add contextual information {noformat} It seems we're still having some encoding mismatch problems. My full output (with extra logging) follows: {noformat} UTF-8 #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)> org/jruby/RubyRegexp.java:1504:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in `pull_event' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in `pull' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in `parse' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in `parse' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in `xml_in' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in `xml_in' test.rb:13:in `(root)' ... #<Encoding:UTF-8> Exception parsing #<Encoding:US-ASCII> #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)> org/jruby/RubyRegexp.java:1504:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in `pull_event' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in `pull' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in `parse' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in `parse' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in `xml_in' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in `xml_in' test.rb:13:in `(root)' ... #<Encoding:UTF-8> Exception parsing #<Encoding:US-ASCII> #<REXML::ParseException: #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)> org/jruby/RubyRegexp.java:1504:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in `pull_event' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in `pull' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in `parse' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in `parse' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in `xml_in' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in `xml_in' test.rb:13:in `(root)' ... Exception parsing Line: 4 Position: 94 Last 80 unconsumed characters: <!-- Savi žemÄ—s unitai -->> /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:427:in `pull_event' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in `pull' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in `parse' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in `parse' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in `xml_in' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in `xml_in' test.rb:13:in `(root)' ... #<Encoding:UTF-8> #<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)> org/jruby/RubyRegexp.java:1504:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/source.rb:210:in `match' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:419:in `pull_event' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/baseparser.rb:183:in `pull' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parsers/treeparser.rb:22:in `parse' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:231:in `build' /Users/headius/projects/jruby/lib/ruby/1.9/rexml/document.rb:43:in `initialize' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:965:in `parse' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:164:in `xml_in' /Users/headius/projects/jruby/lib/ruby/gems/shared/gems/xml-simple-1.1.1/lib/xmlsimple.rb:203:in `xml_in' test.rb:13:in `(root)' ... Exception parsing Line: 4 Position: 94 Last 80 unconsumed characters: <!-- Savi emės unitai --> #<Encoding:ASCII-8BIT> Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT concat at org/jruby/RubyString.java:2521 to_s at /Users/headius/projects/jruby/lib/ruby/1.9/rexml/parseexception.rb:29 message at org/jruby/RubyException.java:266 {noformat} Note that the last error listed is the one from attempting to append the superclass exception's to_s result to the current error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email