Hi Vincent,

I used force_encode for a testing purpose only, which was suggested somewhere.  
Using String#encode introduces another problem, so I don't want to use it in my 
app.  It seems to be that I have to add String#encode to all the String objects 
used with the text read from a file.  This behavior is the same as CRuby 1.9.2.

  File.read("test.txt").force_encoding("UTF-16LE").split("\n")

This script returns an error (Encoding::CompatibilityError) even with declaring 
char code at the beginning or the encoding of the file being UTF-16LE.

Anyway, I found an error in my original post.  The results of MacRuby 0.8 were 
10 times faster.

> *Ruby 1.8.7                           0.0019  0.0018  0.0017
> Ruby 1.9.2                            0.029   0.030   0.029
> **MacRuby 0.8                         0.0028  0.0025  0.0028
> MacRuby 0.9 2011/01/16        0.18            0.17            0.18
> MacRuby 0.9 2011/01/16        0.0023  0.0029  0.0021
> (with encode("UTF-16LE"))

So because of the changes made on 2010/12/17, MacRuby String behaves more like 
Ruby 1.9.2 than 1.8.7 and because of the slow object allocator, this process is 
slow (slower than 1.9.2)?  I'm not sure what changes were made internally, but 
this is too slow compare to 0.8.

How much could the object allocator be (or expected to be) faster than the 
current version (or is it going to be?)?  (I assume the optimization comes 
after 1.0).

My app is a text analysis tool and kwic (keyword in context) is the main 
feature.  With my simple test script, the processing time (kwic) on MacRuby was 
20+ times slower than that on Ruby 1.8.7/1.9.2 (depending on a search word).

Thanks,
Yasu


On 2011/01/17, at 13:19, Vincent Isambart wrote:

> Hi,
> 
>> Indeed, String#[] will now perform slower on UTF8 non-ascii strings, because
>> computing the character index cannot be done in constant time anymore.
>> I don't believe this can be improved using the optimization we implemented
>> for #gsub and #scan. Maybe 1.9.2 has a better optimization, I will let
>> Vincent comment :)
> 
>> text = File.read("test.txt")
>> 1000.times do |i|
>> a = text[i,i+30]
>> end
> 
> In fact I already use the cache to get the offset for the end index.
> I just had a look at 1.9.2 and what they do is pretty similar to what
> we do. I would not be surprised if the difference was mainly due to
> the object allocator being much slower in MacRuby.
> I would need to shark to be sure but I would not expect much
> improvement on String#[] soon.
> 
> And by the way to try with UTF-16 you should not use force_encoding
> but encode, and not UTF-16BE but LE:
> text = text.encode(Encoding::UTF_16LE)
> because the fastest encoding is UTF-16LE and not BE (the native
> encoding on x86 is little endian), and on a UTF-8 string, forcing the
> encoding to ASCII or BINARY(ASCII-8BIT) would make sense (as all ASCII
> characters are the same in UTF-8 and ASCII) but forcing it to UTF-16
> would give you a meaningless string full of strange characters.
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel@lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

_______________________________________________
MacRuby-devel mailing list
MacRuby-devel@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Reply via email to