Re: [jruby-dev] Why don’t using UTF-8 for Ruby internal use?

jia yanping Mon, 28 May 2007 18:57:15 -0700

Sorry for my later reply.My email server has some problem:(




Thanks for your replay:)

We all knows Ruby is good lang for DSL, so I want using Chinese as
identifier in ruby. Like below:

def 你好

        puts '你好'

end

你好 #call the method



So I studied the Jruby's source code. I found JRuby read script files in
charset ISO-8859-1 and do't care the file's encoding.

In file src\org\jruby\util\CommandlineParser.java:

   public Reader getScriptSource() {

       try {

           // KCode.NONE is used because KCODE does not affect parse in
Ruby 1.8

           // if Ruby 2.0 encoding pragmas are implemented, this will need
to change

           if (hasInlineScript) {

               if (scriptFileName != null) {

                   File file = new File(getScriptFileName());

                   return new BufferedReader(new InputStreamReader(new
FileInputStream(file), KCode.NONE.decoder()));

               }

               return new StringReader(inlineScript());

           } else if (isSourceFromStdin()) {

               return new InputStreamReader(System.in, KCode.NONE.decoder
());

           } else {

               File file = new File(getScriptFileName());

               return new BufferedReader(new InputStreamReader(new
FileInputStream(file), KCode.NONE.decoder()));

           }

       } catch (IOException e) {

           throw new MainExitException(1, "Error opening script file: " +
e.getMessage());

       }

   }

So,I change the code in my computer, I Read the source with file's
encoding.So JRuby can recognize Chinese word correctly. And I change the
method isIdentifierChar in file RubyYaccLexer.java, thus Chinese can be
recognized as JRuby's identifier. And I also change the JRuby's String's
implementation, let's JRuby store Strings as bytes with encoding UTF8.



After do this,I think this will do not break the existent code, is it?

Maybe it's have some problem I don't know, so please tell me:)



If this is doable,can I make a branch to do this?

-----邮件原件-----

发件人: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 代表 Charles
Oliver Nutter

发送时间: 2007年5月19日 3:17

收件人: [email protected]

主题: Re: [jruby-dev] why don't using UTF-8 for Ruby internal use?



0-3G-05 贾延平 wrote:

Firest of all, sorry for my poor englishJ

These days, I tried Jruby with chinese. And I have a question:

Why don't using UTF-8 for Ruby internal use, Especially with �CKu option?

We can

1, Read ruby script file using the file's encoding instead of read RAW

data.

2, Strings first convert to java's UTF16 when read from file and then

convert to UTF8 bytes for RubyString

3. so, in Ruby Strings will be implement by bytes with encoding UTF8

Because UTF8 is compatible with ASCII, ruby script file write in

English is OK.




你好，贾延平！



Internally, Ruby strings (even in JRuby) are just a collection of bytes.

If you specify

-Ku they will be UTF-8 internally by default. If you call a Java API that
returns strings, we'll convert that string into bytes as UTF-8. Is there a
specific problem you're seeing?



我不忍情你的���}。



- Charlie



---------------------------------------------------------------------

To unsubscribe from this list please visit:



   http://xircles.codehaus.org/manage_email

Re: [jruby-dev] Why don’t using UTF-8 for Ruby internal use?

Reply via email to