UTF-8 char in XML hangs in Joni
-------------------------------
Key: JRUBY-6204
URL: https://jira.codehaus.org/browse/JRUBY-6204
Project: JRuby
Issue Type: Bug
Affects Versions: JRuby 1.6.5, JRuby 1.6.4
Environment: Nokogiri 1.5.0
export JRUBY_OPTS="--1.9"
Reporter: Anders Bengtsson
Assignee: Thomas E Enebo
Attachments: regexp_killer.rb
In 1.9-mode, when a UTF-8 character is present in an XML string, Nokogiri does
some regexp work that gets stuck within Joni:
"main" prio=5 tid=7fcbdc801000 nid=0x10971c000 runnable [10971a000]
java.lang.Thread.State: RUNNABLE
at org.joni.Matcher.matchCheck(Matcher.java:293)
at org.joni.Matcher.search(Matcher.java:461)
at org.jruby.RubyRegexp.search(RubyRegexp.java:1489)
at org.jruby.RubyRegexp.op_match(RubyRegexp.java:1406)
at org.jruby.ast.Match3Node.interpret(Match3Node.java:101)
at org.jruby.ast.OrNode.interpret(OrNode.java:98)
at org.jruby.ast.IfNode.interpret(IfNode.java:111)
at org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at
org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:75)
at
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:190)
at
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:179)
at
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:312)
at
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:169)
at regexp_killer.__file__(regexp_killer.rb:4)
at regexp_killer.load(regexp_killer.rb)
at org.jruby.Ruby.runScript(Ruby.java:693)
at org.jruby.Ruby.runScript(Ruby.java:686)
at org.jruby.Ruby.runNormally(Ruby.java:593)
at org.jruby.Ruby.runFromMain(Ruby.java:442)
at org.jruby.Main.doRunFromMain(Main.java:321)
at org.jruby.Main.internalRun(Main.java:241)
at org.jruby.Main.run(Main.java:207)
at org.jruby.Main.run(Main.java:191)
at org.jruby.Main.main(Main.java:171)
This will work in 1.8 mode, but breaks in 1.9 mode for 1.6.4, 1.6.5 and HEAD:
#encoding: utf-8
require 'nokogiri'
xml = %q{<?xml version="1.0" encoding="UTF-8"?><hörna/>}
parsed_xml = Nokogiri.parse(xml)
puts "done!"
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email