Re: [MacRuby-devel] Regular expression related performance

Laurent Sansonetti Wed, 01 Dec 2010 16:40:22 -0800

Hi Yasu,

It's committed to trunk, it should be available in tonight's nightly build, so 
feel free to grab it :) http://www.macruby.org/files/nightlies. It will also be 
in the upcoming 0.8 release.


I see your ticket about the look-ahead regexp bug, I will have a look later 
today. Thanks for reporting the problem. Hopefully it can also be fixed for 0.8.

Laurent

On Dec 1, 2010, at 4:29 PM, Yasu Imao wrote:

> Hi Laurent,
> 
> This is great!  I think I read in the discussion of StringScanner performance 
> about object allocation (though I didn't understand what exactly was 
> happening behind the scene), so I guessed it was about 'using block' with 
> regular expression match data.  
> 
> For a word frequency count feature, I could use Test 2 script, but for other 
> part of the app, I needed match information ($`, $' to be exact), so this 
> performance improvement means a lot to my app.
> 
> Is this going to be in 0.8?  Then, I'll test this with my app.
> 
> By the way, the regular expression itself seems to have a bug (not related to 
> this, but to negative look-ahead) and I issued(?) a ticket (though I'm not 
> sure I did it properly).
> 
> Best,
> Yasu
> 
> On 2010/12/02, at 8:50, Laurent Sansonetti wrote:
> 
>> I spoke too fast, having a second look I found that it was possible to make 
>> the Match strings point to a unique object. I committed this optimization in 
>> r4964 and verified that no regression is introduced.
>> 
>> Before:
>> 
>> $ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); 
>> freq=Hash.new(0); text.scan(/\w+/) {}"
>> 
>> real 0m2.430s
>> user 0m1.628s
>> sys  0m1.030s
>> 
>> After :)
>> 
>> $ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); 
>> text.scan(/\w+/) {}"
>> 
>> real 0m0.121s
>> user 0m0.100s
>> sys  0m0.015s
>> 
>> Laurent
>> 
>> On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:
>> 
>>> Hi Yasu,
>>> 
>>> I ran your tests in Shark. Tests 1 and 3 are significantly slower because 
>>> #scan and #gsub are called with a block, which means MacRuby has to create 
>>> a new Match object for every yield, to conform to the Ruby specs. Each 
>>> Match object contains a copy of the original string.
>>> 
>>> MacRuby has a slow memory allocator (much slower than the original Ruby), 
>>> so one must be careful to not allocate too many objects. This is something 
>>> we are working on, unfortunately MacRuby doesn't fully control the object 
>>> allocator, as it resides in the libauto library (the Objective-C garbage 
>>> collector).
>>> 
>>> In your case, I recommend using the method in Test 2, which is to not pass 
>>> a block. 
>>> 
>>> It is possible that we can reduce memory usage when doing regexps in 
>>> MacRuby, however after having a quick look at the source code I am not sure 
>>> something can be done for 0.8 :(
>>> 
>>> Laurent
>>> 
>>> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I'm rewriting an app for text analysis in MacRuby, which I originally 
>>>> wrote in RubyCocoa.  But I encountered a serious performance issue in 
>>>> MacRuby, which is related to processing text using regular expressions.  
>>>> 
>>>> I'm wondering if this will be taken care of in the near future (or already 
>>>> done in 0.8?).
>>>> 
>>>> Below are my simple tests.  The first two are essentially the same with a 
>>>> slightly different approach.  Both are simply counting frequency of each 
>>>> word.  I want to use the first approach not to count word frequencies, but 
>>>> in other processes.  The third one is to test the speed of String#gsub 
>>>> with regular expression.  I felt String#gsub was slow in my app, so I just 
>>>> wanted to test how slow it is compared to RubyCocoa.
>>>> 
>>>> 
>>>> Test 1 - scan-block
>>>> 
>>>> freq = Hash.new(0)
>>>> text.scan(/\w+/) do |word|
>>>> freq[word] += 1
>>>> end
>>>> 
>>>> 
>>>> Test 2 - scan array.each
>>>> 
>>>> freq = Hash.new(0)
>>>> text.scan(/\w+/).each do |word|
>>>> freq[word] += 1
>>>> end
>>>> 
>>>> 
>>>> Test 3 - gsub upcase
>>>> 
>>>> text.gsub!(/\w+/){|x| x.upcase}  
>>>> 
>>>> 
>>>> The results are in seconds.  The original text is in English with 8154 
>>>> words.  Each process was repeated 10 times to calculate processing times.  
>>>> Each test were done 3 times.
>>>> 
>>>> Ruby 1.8.7  Test1 - scan-block:                      0.542,    0.502,    
>>>> 0.518
>>>> Ruby 1.8.7  Test2 - scan array.each:                 0.399,    0.392,    
>>>> 0.399
>>>> Ruby 1.8.7  Test3 - gsub upcase:             0.384,    0.349,    0.390
>>>> 
>>>> MacRuby 0.7.1 Test1 - scan-block:                  27.612,  27.707,  27.453
>>>> MacRuby 0.7.1 Test2 - scan array.each:       3.556,    3.616,    3.554
>>>> MacRuby 0.7.1 Test3 - gsub upcase:                 27.613,  26.826,  27.327
>>>> 
>>>> 
>>>> Thanks,
>>>> Yasu
>>>> _______________________________________________
>>>> MacRuby-devel mailing list
>>>> MacRuby-devel@lists.macosforge.org
>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>>> 
>>> _______________________________________________
>>> MacRuby-devel mailing list
>>> MacRuby-devel@lists.macosforge.org
>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
>> 
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel@lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel@lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

_______________________________________________
MacRuby-devel mailing list
MacRuby-devel@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Re: [MacRuby-devel] Regular expression related performance

Reply via email to