Unicode regular expressions by UTF-8 don't work
-----------------------------------------------
Key: JRUBY-2982
URL: http://jira.codehaus.org/browse/JRUBY-2982
Project: JRuby
Issue Type: Bug
Components: Core Classes/Modules
Affects Versions: JRuby 1.1.4
Environment: Linux, OSX 10.5.4
Reporter: Yoko Harada
Unicode regular expressions by property names described in Oniguruma's document
don't work if a script file is saved by UTF-8 encoding. For example. this
raises an exception, "invalid character property name {Katakana}:
/\p{Katakana}/ (RegexpError)."
{code}
# -*- coding: UTF-8 -*-
$KCODE = "utf8"
p 'abcアイウαβγ'.scan(/[a-z]/)
p 'abcアイウαβγ'.scan(/\p{Katakana}/)
p 'abcアイウαβγ'.scan(/\p{^Greek}/)
p 'abcアイウαβγ'.scan(/[\u0370-\u30FF]/)
{code}
"/\p{Katakana}/u" raised the same exception, too.
Whereas current Ruby 1.9 (ruby 1.9.0 (2008-08-26 revision 18849)
[i386-darwin9.4.0]) outputs:
{code}
warning: variable $KCODE is no longer effective; ignored
["a", "b", "c"]
["ア", "イ", "ウ"]
["a", "b", "c", "ア", "イ", "ウ"]
["ア", "イ", "ウ", "α", "β", "γ"]
{code}
When I recompiled JRuby 1.1.4 by turning USE_UNICODE_PROPERTIES option of joni
to true, unicode property name expressions worked as well as Ruby 1.9 does. The
last unicode codepoint range expression didn't work even after recompiling.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email