UTF-8 char in XML hangs in Joni ------------------------------- Key: JRUBY-6204 URL: https://jira.codehaus.org/browse/JRUBY-6204 Project: JRuby Issue Type: Bug Affects Versions: JRuby 1.6.5, JRuby 1.6.4 Environment: Nokogiri 1.5.0 export JRUBY_OPTS="--1.9" Reporter: Anders Bengtsson Assignee: Thomas E Enebo Attachments: regexp_killer.rb
In 1.9-mode, when a UTF-8 character is present in an XML string, Nokogiri does some regexp work that gets stuck within Joni: "main" prio=5 tid=7fcbdc801000 nid=0x10971c000 runnable [10971a000] java.lang.Thread.State: RUNNABLE at org.joni.Matcher.matchCheck(Matcher.java:293) at org.joni.Matcher.search(Matcher.java:461) at org.jruby.RubyRegexp.search(RubyRegexp.java:1489) at org.jruby.RubyRegexp.op_match(RubyRegexp.java:1406) at org.jruby.ast.Match3Node.interpret(Match3Node.java:101) at org.jruby.ast.OrNode.interpret(OrNode.java:98) at org.jruby.ast.IfNode.interpret(IfNode.java:111) at org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123) at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104) at org.jruby.ast.BlockNode.interpret(BlockNode.java:71) at org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:75) at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:190) at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:179) at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:312) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:169) at regexp_killer.__file__(regexp_killer.rb:4) at regexp_killer.load(regexp_killer.rb) at org.jruby.Ruby.runScript(Ruby.java:693) at org.jruby.Ruby.runScript(Ruby.java:686) at org.jruby.Ruby.runNormally(Ruby.java:593) at org.jruby.Ruby.runFromMain(Ruby.java:442) at org.jruby.Main.doRunFromMain(Main.java:321) at org.jruby.Main.internalRun(Main.java:241) at org.jruby.Main.run(Main.java:207) at org.jruby.Main.run(Main.java:191) at org.jruby.Main.main(Main.java:171) This will work in 1.8 mode, but breaks in 1.9 mode for 1.6.4, 1.6.5 and HEAD: #encoding: utf-8 require 'nokogiri' xml = %q{<?xml version="1.0" encoding="UTF-8"?><hörna/>} parsed_xml = Nokogiri.parse(xml) puts "done!" -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email