Hi Laurent, I filed a ticket on String#gsub performance.
While I was playing with regex, I noticed another difference(?) between Ruby 1.8.7 and MacRuby. In Ruby Regexp class, fixnums assigned to Regexp.new options are different with Ruby 1.8.7 and with MacRuby Ruby 1.8.7 p Regexp::IGNORECASE => 1 p Regexp::MULTILINE => 4 p Regexp::EXTENDED => 2 MacRuby p Regexp::IGNORECASE => 2 p Regexp::MULTILINE => 32 p Regexp::EXTENDED => 4 Is this because of the difference between Ruby 1.8 and Ruby 1.9 or between Oniguruma and ICU? Or was there a decision to assign different fixnums to these? Best, Yasu On 2010/12/03, at 6:26, Laurent Sansonetti wrote: > Hi Yasu, > > On Dec 2, 2010, at 5:20 AM, Yasu Imao wrote: > >> Hi Laurent, >> >> Thank you for your prompt work. I tried the latest nightly build and it's >> much faster than 0.7.1. The Test 1 and Test 2 are only 2 - 2.5 times slower >> than those on Ruby 1.8.7 and Test 3 is about 5 times slower. And I tried my >> app on this nightly build. Now I can say MacRuby version of my app is quite >> usable. From now on, I'll be more serious about rewriting my RubyCocoa apps >> in MacRuby. > > Excellent! I think being from 2 to 5 times slower is still unacceptable, but > it's still better than before :) > >> But I was curious about the difference between String#scan and String#gsub, >> so I also tested String#gsub without a block. >> >> text.gsub!(/\w+/,"test") >> >> This was also about 5 times slower on MacRuby than Ruby 1.8.7. Could this >> be a bit more faster? This is not in the main process of my apps (for >> pre-processing of text), so the performance of String#gsub doesn't affect as >> much, though. > > It's maybe possible. Could you file a ticket on trac and include a small > snippet? I will have a look. > >> And thanks for looking into the regexp bug. I guess I'll have to wait and >> see if Apple updates ICU on OS X. > > After thinking more, it may be possible to statically compile against a newer > ICU and pass the appropriate linker flags so that symbols won't collide later > at runtime when loading Cocoa. I will investigate. > > It may be a better solution overall, as MacRuby would use the same ICU > version regardless of the version of Mac OS X it runs on, however, it will > probably increase the runtime library size. > > Laurent > >> Best, >> Yasu >> >> On 2010/12/02, at 9:39, Laurent Sansonetti wrote: >> >>> Hi Yasu, >>> >>> It's committed to trunk, it should be available in tonight's nightly build, >>> so feel free to grab it :) http://www.macruby.org/files/nightlies. It will >>> also be in the upcoming 0.8 release. >>> >>> I see your ticket about the look-ahead regexp bug, I will have a look later >>> today. Thanks for reporting the problem. Hopefully it can also be fixed for >>> 0.8. >>> >>> Laurent >>> >>> On Dec 1, 2010, at 4:29 PM, Yasu Imao wrote: >>> >>>> Hi Laurent, >>>> >>>> This is great! I think I read in the discussion of StringScanner >>>> performance about object allocation (though I didn't understand what >>>> exactly was happening behind the scene), so I guessed it was about 'using >>>> block' with regular expression match data. >>>> >>>> For a word frequency count feature, I could use Test 2 script, but for >>>> other part of the app, I needed match information ($`, $' to be exact), so >>>> this performance improvement means a lot to my app. >>>> >>>> Is this going to be in 0.8? Then, I'll test this with my app. >>>> >>>> By the way, the regular expression itself seems to have a bug (not related >>>> to this, but to negative look-ahead) and I issued(?) a ticket (though I'm >>>> not sure I did it properly). >>>> >>>> Best, >>>> Yasu >>>> >>>> On 2010/12/02, at 8:50, Laurent Sansonetti wrote: >>>> >>>>> I spoke too fast, having a second look I found that it was possible to >>>>> make the Match strings point to a unique object. I committed this >>>>> optimization in r4964 and verified that no regression is introduced. >>>>> >>>>> Before: >>>>> >>>>> $ time /usr/local/bin/macruby -e "text=File.read('/tmp/foo.txt'); >>>>> freq=Hash.new(0); text.scan(/\w+/) {}" >>>>> >>>>> real 0m2.430s >>>>> user 0m1.628s >>>>> sys 0m1.030s >>>>> >>>>> After :) >>>>> >>>>> $ time ./miniruby -e "text=File.read('/tmp/foo.txt'); freq=Hash.new(0); >>>>> text.scan(/\w+/) {}" >>>>> >>>>> real 0m0.121s >>>>> user 0m0.100s >>>>> sys 0m0.015s >>>>> >>>>> Laurent >>>>> >>>>> On Dec 1, 2010, at 2:46 PM, Laurent Sansonetti wrote: >>>>> >>>>>> Hi Yasu, >>>>>> >>>>>> I ran your tests in Shark. Tests 1 and 3 are significantly slower >>>>>> because #scan and #gsub are called with a block, which means MacRuby has >>>>>> to create a new Match object for every yield, to conform to the Ruby >>>>>> specs. Each Match object contains a copy of the original string. >>>>>> >>>>>> MacRuby has a slow memory allocator (much slower than the original >>>>>> Ruby), so one must be careful to not allocate too many objects. This is >>>>>> something we are working on, unfortunately MacRuby doesn't fully control >>>>>> the object allocator, as it resides in the libauto library (the >>>>>> Objective-C garbage collector). >>>>>> >>>>>> In your case, I recommend using the method in Test 2, which is to not >>>>>> pass a block. >>>>>> >>>>>> It is possible that we can reduce memory usage when doing regexps in >>>>>> MacRuby, however after having a quick look at the source code I am not >>>>>> sure something can be done for 0.8 :( >>>>>> >>>>>> Laurent >>>>>> >>>>>> On Dec 1, 2010, at 9:46 AM, Yasu Imao wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> I'm rewriting an app for text analysis in MacRuby, which I originally >>>>>>> wrote in RubyCocoa. But I encountered a serious performance issue in >>>>>>> MacRuby, which is related to processing text using regular expressions. >>>>>>> >>>>>>> >>>>>>> I'm wondering if this will be taken care of in the near future (or >>>>>>> already done in 0.8?). >>>>>>> >>>>>>> Below are my simple tests. The first two are essentially the same with >>>>>>> a slightly different approach. Both are simply counting frequency of >>>>>>> each word. I want to use the first approach not to count word >>>>>>> frequencies, but in other processes. The third one is to test the >>>>>>> speed of String#gsub with regular expression. I felt String#gsub was >>>>>>> slow in my app, so I just wanted to test how slow it is compared to >>>>>>> RubyCocoa. >>>>>>> >>>>>>> >>>>>>> Test 1 - scan-block >>>>>>> >>>>>>> freq = Hash.new(0) >>>>>>> text.scan(/\w+/) do |word| >>>>>>> freq[word] += 1 >>>>>>> end >>>>>>> >>>>>>> >>>>>>> Test 2 - scan array.each >>>>>>> >>>>>>> freq = Hash.new(0) >>>>>>> text.scan(/\w+/).each do |word| >>>>>>> freq[word] += 1 >>>>>>> end >>>>>>> >>>>>>> >>>>>>> Test 3 - gsub upcase >>>>>>> >>>>>>> text.gsub!(/\w+/){|x| x.upcase} >>>>>>> >>>>>>> >>>>>>> The results are in seconds. The original text is in English with 8154 >>>>>>> words. Each process was repeated 10 times to calculate processing >>>>>>> times. Each test were done 3 times. >>>>>>> >>>>>>> Ruby 1.8.7 Test1 - scan-block: 0.542, >>>>>>> 0.502, 0.518 >>>>>>> Ruby 1.8.7 Test2 - scan array.each: 0.399, >>>>>>> 0.392, 0.399 >>>>>>> Ruby 1.8.7 Test3 - gsub upcase: 0.384, 0.349, >>>>>>> 0.390 >>>>>>> >>>>>>> MacRuby 0.7.1 Test1 - scan-block: 27.612, 27.707, 27.453 >>>>>>> MacRuby 0.7.1 Test2 - scan array.each: 3.556, 3.616, 3.554 >>>>>>> MacRuby 0.7.1 Test3 - gsub upcase: 27.613, 26.826, 27.327 >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Yasu >>>>>>> _______________________________________________ >>>>>>> MacRuby-devel mailing list >>>>>>> MacRuby-devel@lists.macosforge.org >>>>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel >>>>>> >>>>>> _______________________________________________ >>>>>> MacRuby-devel mailing list >>>>>> MacRuby-devel@lists.macosforge.org >>>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel >>>>> >>>>> _______________________________________________ >>>>> MacRuby-devel mailing list >>>>> MacRuby-devel@lists.macosforge.org >>>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel >>>> >>>> _______________________________________________ >>>> MacRuby-devel mailing list >>>> MacRuby-devel@lists.macosforge.org >>>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel >>> >>> _______________________________________________ >>> MacRuby-devel mailing list >>> MacRuby-devel@lists.macosforge.org >>> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel >> >> _______________________________________________ >> MacRuby-devel mailing list >> MacRuby-devel@lists.macosforge.org >> http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel > > _______________________________________________ > MacRuby-devel mailing list > MacRuby-devel@lists.macosforge.org > http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel _______________________________________________ MacRuby-devel mailing list MacRuby-devel@lists.macosforge.org http://lists.macosforge.org/mailman/listinfo.cgi/macruby-devel