Scott Gonyea created JRUBY-6668: ----------------------------------- Summary: StringScanner#scan_until spins forever on UTF-8 data Key: JRUBY-6668 URL: https://jira.codehaus.org/browse/JRUBY-6668 Project: JRuby Issue Type: Bug Affects Versions: JRuby 1.6.6 Environment: Mac OS X Lion.
java -version java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode) JRuby 1.6.5 / 1.6.6 Reporter: Scott Gonyea Assignee: Thomas E Enebo While running the tests in the ruby library 'mustache' (link: https://github.com/defunkt/mustache), one test in particular is failing: https://github.com/defunkt/mustache/blob/master/test/mustache_test.rb#L510-522 JRuby dies calling StringScanner#scan_until here: https://github.com/defunkt/mustache/blob/master/lib/mustache/parser.rb#L231 You can reproduce the issue with the following: require 'strscan' regex = /(^[ \t]*)?\{\{/ text = "<h1>中文 {{test}}</h1>\n\n{{> utf8_partial}}\n" text.force_encoding 'BINARY' scanner = StringScanner.new(text) scanner.scan_until(regex) # Fans spin up, and this method never returns. This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode. I am running this test like so: JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8" It appears that this affects UTF-8 characters. If I replace the chinese characters with "foo bar", then there is no problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://jira.codehaus.org/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email