Unicode regular expressions by UTF-8 don't work
-----------------------------------------------

                 Key: JRUBY-2982
                 URL: http://jira.codehaus.org/browse/JRUBY-2982
             Project: JRuby
          Issue Type: Bug
          Components: Core Classes/Modules
    Affects Versions: JRuby 1.1.4
         Environment: Linux, OSX 10.5.4
            Reporter: Yoko Harada


Unicode regular expressions by property names described in Oniguruma's document 
don't work if a script file is saved by UTF-8 encoding. For example. this 
raises an exception, "invalid character property name {Katakana}: 
/\p{Katakana}/ (RegexpError)."

{code}
# -*- coding: UTF-8 -*-

$KCODE = "utf8"

p 'abcアイウαβγ'.scan(/[a-z]/)
p 'abcアイウαβγ'.scan(/\p{Katakana}/)
p 'abcアイウαβγ'.scan(/\p{^Greek}/)
p 'abcアイウαβγ'.scan(/[\u0370-\u30FF]/)
{code}

"/\p{Katakana}/u" raised the same exception, too.

Whereas current Ruby 1.9 (ruby 1.9.0 (2008-08-26 revision 18849) 
[i386-darwin9.4.0]) outputs:

{code}
warning: variable $KCODE is no longer effective; ignored
["a", "b", "c"]
["ア", "イ", "ウ"]
["a", "b", "c", "ア", "イ", "ウ"]
["ア", "イ", "ウ", "α", "β", "γ"]
{code}

When I recompiled JRuby 1.1.4 by turning USE_UNICODE_PROPERTIES option of joni 
to true, unicode property name expressions worked as well as Ruby 1.9 does. The 
last unicode codepoint range expression didn't work even after recompiling.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to