[jruby-dev] [jira] Created: (JRUBY-2982) Unicode regular expressions by UTF-8 don't work

Yoko Harada (JIRA) Tue, 09 Sep 2008 08:59:45 -0700

Unicode regular expressions by UTF-8 don't work
-----------------------------------------------


                 Key: JRUBY-2982
                 URL: http://jira.codehaus.org/browse/JRUBY-2982
             Project: JRuby
          Issue Type: Bug
          Components: Core Classes/Modules
    Affects Versions: JRuby 1.1.4
         Environment: Linux, OSX 10.5.4
            Reporter: Yoko Harada


Unicode regular expressions by property names described in Oniguruma's document 
don't work if a script file is saved by UTF-8 encoding. For example. this 
raises an exception, "invalid character property name {Katakana}: 
/\p{Katakana}/ (RegexpError)."

{code}
# -*- coding: UTF-8 -*-

$KCODE = "utf8"

p 'abc&#12450;&#12452;&#12454;&#945;&#946;&#947;'.scan(/[a-z]/)
p 'abc&#12450;&#12452;&#12454;&#945;&#946;&#947;'.scan(/\p{Katakana}/)
p 'abc&#12450;&#12452;&#12454;&#945;&#946;&#947;'.scan(/\p{^Greek}/)
p 'abc&#12450;&#12452;&#12454;&#945;&#946;&#947;'.scan(/[\u0370-\u30FF]/)
{code}

"/\p{Katakana}/u" raised the same exception, too.

Whereas current Ruby 1.9 (ruby 1.9.0 (2008-08-26 revision 18849) 
[i386-darwin9.4.0]) outputs:

{code}
warning: variable $KCODE is no longer effective; ignored
["a", "b", "c"]
["&#12450;", "&#12452;", "&#12454;"]
["a", "b", "c", "&#12450;", "&#12452;", "&#12454;"]
["&#12450;", "&#12452;", "&#12454;", "&#945;", "&#946;", "&#947;"]
{code}

When I recompiled JRuby 1.1.4 by turning USE_UNICODE_PROPERTIES option of joni 
to true, unicode property name expressions worked as well as Ruby 1.9 does. The 
last unicode codepoint range expression didn't work even after recompiling.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

[jruby-dev] [jira] Created: (JRUBY-2982) Unicode regular expressions by UTF-8 don't work

Reply via email to