Hi Laurent,

I filed a ticket on String#gsub performance.

While I was playing with regex, I noticed another difference(?) between Ruby 
1.8.7 and MacRuby.  

In Ruby Regexp class, fixnums assigned to Regexp.new options are different with 
Ruby 1.8.7 and with MacRuby

Ruby 1.8.7

p Regexp::IGNORECASE => 1
p Regexp::MULTILINE => 4
p Regexp::EXTENDED => 2

MacRuby

p Regexp::IGNORECASE => 2
p Regexp::MULTILINE => 32
p Regexp::EXTENDED => 4


Is this because of the difference between Ruby 1.8 and Ruby 1.9 or between 
Oniguruma and ICU?  Or was there a decision to assign different fixnums to 
these?


Best,
Yasu


On 2010/12/03, at 6:26, Laurent Sansonetti wrote:

> Hi Yasu,
> 
> On Dec 2, 2010, at 5:20 AM, Yasu Imao wrote:
> 
>> Hi Laurent,
>> 
>> Thank you for your prompt work.  I tried the latest nightly build and it's 
>> much faster than 0.7.1.  The Test 1 and Test 2 are only 2 - 2.5 times slower 
>> than those on Ruby 1.8.7 and Test 3 is about 5 times slower.  And I tried my 
>> app on this nightly build.  Now I can say MacRuby version of my app is quite 
>> usable.  From now on, I'll be more serious about rewriting my RubyCocoa apps 
>> in MacRuby.
> 
> Excellent! I think being from 2 to 5 times slower is still unacceptable, but 
> it's still better than before :)
> 
>> But I was curious about the difference between String#scan and String#gsub, 
>> so I also tested String#gsub without a block.  
>> 
>> text.gsub!(/\w+/,"test")
>> 
>> This was also about 5 times slower on MacRuby than Ruby 1.8.7.  Could this 
>> be a bit more faster?  This is not in the main process of my apps (for 
>> pre-processing of text), so the performance of String#gsub doesn't affect as 
>> much, though.
> 
> It's maybe possible. Could you file a ticket on trac and include a small 
> snippet? I will have a look.
> 
>> And thanks for looking into the regexp bug.  I guess I'll have to wait and 
>> see if Apple updates ICU on OS X.
> 
> After thinking more, it may be possible to statically compile against a newer 
> ICU and pass the appropriate linker flags so that symbols won't collide later 
> at runtime when loading Cocoa. I will investigate.
> 
> It may be a better solution overall, as MacRuby would use the same ICU 
> version regardless of the version of Mac OS X it runs on, however, it will 
> probably increase the runtime library size.
> 
> Laurent
> 
>> Best,
>> Yasu
>> 
>> On 2010/12/02, at 9:39, Laurent Sansonetti wrote:
>> 
>>> Hi Yasu,
>>> 
>>> It's committed to trunk, it should be available in tonight's nightly build, 
>>> so feel free to grab it :) http://www.macruby.org/files/nightlies. It will 
>>> also be in the upcoming 0.8 release.
>>> 
>>> I see your ticket about the look-ahead regexp bug, I will have a look later 
>>> today. Thanks for reporting the problem. Hopefully it can also be fixed for 
>>> 0.8.
>>> 
>>> Laurent
>>> 
>>> On Dec 1, 2010, at 4:29 PM, Yasu Imao wrote:
>>> 
>>>> Hi Laurent,
>>>> 
>>>> This is great!  I think I read in the discussion of StringScanner 
>>>> performance about object allocation (though I didn't understand what 
>>>> exactly was happening behind the scene), so I guessed it was about 'using 
>>>> block' with regular expression match data.  
>>>> 
>>>> For a word frequency count feature, I could use Test 2 script, but for 
>>>> other part of the app, I needed match information ($`, $' to be exact), so 
>>>> this performance improvement means a lot to my app.
>>>> 
>>>> Is this going to be in 0.8?  Then, I'll test this with my app.
>>>> 
>>>> By the way, the regular expression itself seems to have a bug (not related 
>>>> to this, but to negative look-ahead) and I issued(?) a ticket (though I'm 
>>>> not sure I did it properly).
>>>> 
>>>> Best,
>>>> Yasu
>>>> 
>>>> On 2010/12/02, at 8:50, Laurent Sansonetti wrote:
>>>> 
>>>>> I spoke too fast, having a second look I found that it was possible to 
>>>>> make the Match strings point to a unique object. I committed this 
>>>>> optimization in r4964 and verified that no regression is introduced.
>>>>> 
>>>>> Before:
>>>>> 
>>>>> $ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); 
>>>>> freq=Hash.new(0); text.scan(/\w+/) {}"
>>>>> 
>>>>> real      0m2.430s
>>>>> user      0m1.628s
>>>>> sys       0m1.030s
>>>>> 
>>>>> After :)
>>>>> 
>>>>> $ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); 
>>>>> text.scan(/\w+/) {}"
>>>>> 
>>>>> real      0m0.121s
>>>>> user      0m0.100s
>>>>> sys       0m0.015s
>>>>> 
>>>>> Laurent
>>>>> 
>>>>> On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:
>>>>> 
>>>>>> Hi Yasu,
>>>>>> 
>>>>>> I ran your tests in Shark. Tests 1 and 3 are significantly slower 
>>>>>> because #scan and #gsub are called with a block, which means MacRuby has 
>>>>>> to create a new Match object for every yield, to conform to the Ruby 
>>>>>> specs. Each Match object contains a copy of the original string.
>>>>>> 
>>>>>> MacRuby has a slow memory allocator (much slower than the original 
>>>>>> Ruby), so one must be careful to not allocate too many objects. This is 
>>>>>> something we are working on, unfortunately MacRuby doesn't fully control 
>>>>>> the object allocator, as it resides in the libauto library (the 
>>>>>> Objective-C garbage collector).
>>>>>> 
>>>>>> In your case, I recommend using the method in Test 2, which is to not 
>>>>>> pass a block. 
>>>>>> 
>>>>>> It is possible that we can reduce memory usage when doing regexps in 
>>>>>> MacRuby, however after having a quick look at the source code I am not 
>>>>>> sure something can be done for 0.8 :(
>>>>>> 
>>>>>> Laurent
>>>>>> 
>>>>>> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm rewriting an app for text analysis in MacRuby, which I originally 
>>>>>>> wrote in RubyCocoa.  But I encountered a serious performance issue in 
>>>>>>> MacRuby, which is related to processing text using regular expressions. 
>>>>>>>  
>>>>>>> 
>>>>>>> I'm wondering if this will be taken care of in the near future (or 
>>>>>>> already done in 0.8?).
>>>>>>> 
>>>>>>> Below are my simple tests.  The first two are essentially the same with 
>>>>>>> a slightly different approach.  Both are simply counting frequency of 
>>>>>>> each word.  I want to use the first approach not to count word 
>>>>>>> frequencies, but in other processes.  The third one is to test the 
>>>>>>> speed of String#gsub with regular expression.  I felt String#gsub was 
>>>>>>> slow in my app, so I just wanted to test how slow it is compared to 
>>>>>>> RubyCocoa.
>>>>>>> 
>>>>>>> 
>>>>>>> Test 1 - scan-block
>>>>>>> 
>>>>>>> freq = Hash.new(0)
>>>>>>> text.scan(/\w+/) do |word|
>>>>>>> freq[word] += 1
>>>>>>> end
>>>>>>> 
>>>>>>> 
>>>>>>> Test 2 - scan array.each
>>>>>>> 
>>>>>>> freq = Hash.new(0)
>>>>>>> text.scan(/\w+/).each do |word|
>>>>>>> freq[word] += 1
>>>>>>> end
>>>>>>> 
>>>>>>> 
>>>>>>> Test 3 - gsub upcase
>>>>>>> 
>>>>>>> text.gsub!(/\w+/){|x| x.upcase}  
>>>>>>> 
>>>>>>> 
>>>>>>> The results are in seconds.  The original text is in English with 8154 
>>>>>>> words.  Each process was repeated 10 times to calculate processing 
>>>>>>> times.  Each test were done 3 times.
>>>>>>> 
>>>>>>> Ruby 1.8.7       Test1 - scan-block:                      0.542,    
>>>>>>> 0.502,    0.518
>>>>>>> Ruby 1.8.7       Test2 - scan array.each:                 0.399,    
>>>>>>> 0.392,    0.399
>>>>>>> Ruby 1.8.7       Test3 - gsub upcase:             0.384,    0.349,    
>>>>>>> 0.390
>>>>>>> 
>>>>>>> MacRuby 0.7.1 Test1 - scan-block:               27.612,  27.707,  27.453
>>>>>>> MacRuby 0.7.1 Test2 - scan array.each:    3.556,    3.616,    3.554
>>>>>>> MacRuby 0.7.1 Test3 - gsub upcase:              27.613,  26.826,  27.327
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Yasu
>>>>>>> _______________________________________________
>>>>>>> MacRuby-devel mailing list
>>>>>>> MacRuby-devel@lists.macosforge.org
>>>>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>>>>> 
>>>>>> _______________________________________________
>>>>>> MacRuby-devel mailing list
>>>>>> MacRuby-devel@lists.macosforge.org
>>>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>>>> 
>>>>> _______________________________________________
>>>>> MacRuby-devel mailing list
>>>>> MacRuby-devel@lists.macosforge.org
>>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>>> 
>>>> _______________________________________________
>>>> MacRuby-devel mailing list
>>>> MacRuby-devel@lists.macosforge.org
>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>> 
>>> _______________________________________________
>>> MacRuby-devel mailing list
>>> MacRuby-devel@lists.macosforge.org
>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>> 
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel@lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel@lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

_______________________________________________
MacRuby-devel mailing list
MacRuby-devel@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Reply via email to