Scott Gonyea created JRUBY-6668:
-----------------------------------
Summary: StringScanner#scan_until spins forever on UTF-8 data
Key: JRUBY-6668
URL: https://jira.codehaus.org/browse/JRUBY-6668
Project: JRuby
Issue Type: Bug
Affects Versions: JRuby 1.6.6
Environment: Mac OS X Lion.
java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode)
JRuby 1.6.5 / 1.6.6
Reporter: Scott Gonyea
Assignee: Thomas E Enebo
While running the tests in the ruby library 'mustache' (link:
https://github.com/defunkt/mustache), one test in particular is failing:
https://github.com/defunkt/mustache/blob/master/test/mustache_test.rb#L510-522
JRuby dies calling StringScanner#scan_until here:
https://github.com/defunkt/mustache/blob/master/lib/mustache/parser.rb#L231
You can reproduce the issue with the following:
require 'strscan'
regex = /(^[ \t]*)?\{\{/
text = "<h1>中文 {{test}}</h1>\n\n{{> utf8_partial}}\n"
text.force_encoding 'BINARY'
scanner = StringScanner.new(text)
scanner.scan_until(regex) # Fans spin up, and this method never returns.
This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode.
I am running this test like so:
JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v
I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8"
It appears that this affects UTF-8 characters. If I replace the chinese
characters with "foo bar", then there is no problem.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://jira.codehaus.org/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email