I spoke too fast, having a second look I found that it was possible to make the 
Match strings point to a unique object. I committed this optimization in r4964 
and verified that no regression is introduced.

Before:

$ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); 
freq=Hash.new(0); text.scan(/\w+/) {}"

real    0m2.430s
user    0m1.628s
sys     0m1.030s

After :)

$ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); 
text.scan(/\w+/) {}"

real    0m0.121s
user    0m0.100s
sys     0m0.015s

Laurent

On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote:

> Hi Yasu,
> 
> I ran your tests in Shark. Tests 1 and 3 are significantly slower because 
> #scan and #gsub are called with a block, which means MacRuby has to create a 
> new Match object for every yield, to conform to the Ruby specs. Each Match 
> object contains a copy of the original string.
> 
> MacRuby has a slow memory allocator (much slower than the original Ruby), so 
> one must be careful to not allocate too many objects. This is something we 
> are working on, unfortunately MacRuby doesn't fully control the object 
> allocator, as it resides in the libauto library (the Objective-C garbage 
> collector).
> 
> In your case, I recommend using the method in Test 2, which is to not pass a 
> block. 
> 
> It is possible that we can reduce memory usage when doing regexps in MacRuby, 
> however after having a quick look at the source code I am not sure something 
> can be done for 0.8 :(
> 
> Laurent
> 
> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote:
> 
>> Hello,
>> 
>> I'm rewriting an app for text analysis in MacRuby, which I originally wrote 
>> in RubyCocoa.  But I encountered a serious performance issue in MacRuby, 
>> which is related to processing text using regular expressions.  
>> 
>> I'm wondering if this will be taken care of in the near future (or already 
>> done in 0.8?).
>> 
>> Below are my simple tests.  The first two are essentially the same with a 
>> slightly different approach.  Both are simply counting frequency of each 
>> word.  I want to use the first approach not to count word frequencies, but 
>> in other processes.  The third one is to test the speed of String#gsub with 
>> regular expression.  I felt String#gsub was slow in my app, so I just wanted 
>> to test how slow it is compared to RubyCocoa.
>> 
>> 
>> Test 1 - scan-block
>> 
>> freq = Hash.new(0)
>> text.scan(/\w+/) do |word|
>>  freq[word] += 1
>> end
>> 
>> 
>> Test 2 - scan array.each
>> 
>> freq = Hash.new(0)
>> text.scan(/\w+/).each do |word|
>>  freq[word] += 1
>> end
>> 
>> 
>> Test 3 - gsub upcase
>> 
>> text.gsub!(/\w+/){|x| x.upcase}  
>> 
>> 
>> The results are in seconds.  The original text is in English with 8154 
>> words.  Each process was repeated 10 times to calculate processing times.  
>> Each test were done 3 times.
>> 
>> Ruby 1.8.7    Test1 - scan-block:                      0.542,    0.502,    
>> 0.518
>> Ruby 1.8.7    Test2 - scan array.each:                 0.399,    0.392,    
>> 0.399
>> Ruby 1.8.7    Test3 - gsub upcase:             0.384,    0.349,    0.390
>> 
>> MacRuby 0.7.1 Test1 - scan-block:                    27.612,  27.707,  27.453
>> MacRuby 0.7.1 Test2 - scan array.each:         3.556,    3.616,    3.554
>> MacRuby 0.7.1 Test3 - gsub upcase:                   27.613,  26.826,  27.327
>> 
>> 
>> Thanks,
>> Yasu
>> _______________________________________________
>> MacRuby-devel mailing list
>> MacRuby-devel@lists.macosforge.org
>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel
> 
> _______________________________________________
> MacRuby-devel mailing list
> MacRuby-devel@lists.macosforge.org
> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

_______________________________________________
MacRuby-devel mailing list
MacRuby-devel@lists.macosforge.org
http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel

Reply via email to