I just did another test: I stripped out all split() and replace() calls and
all regexps except for re0 from the test,  and changed the exec to set
index and input on the result. Yes, this is *really* microbenchmark-y, I
know.

With these changes applied, our score increases to ~38.000. With the exec
function commented out, it goes down to ~22.000. v8 for some reason doesn't
like the exec function, so I can't test that, but without it, it reaches
~38.800.

So even in the ideal case, a simple, unsafe and not-spec-compliant
hand-rolled JS version for one of the simplest-imaginable regexps doesn't
beat v8. For all I know, this might look very different in more complex
cases, of course.


On Sun, Jan 5, 2014 at 1:28 PM, hv1989 <[email protected]> wrote:

> Hi Julian,
>
> don't forget that it is a bit unfair to compare like that. (For 2 reasons).
> 1) Exec needs to set re.LastIndex accordingly. I.e. set it to 0 at the
> start and correctly upon match
> 2) The result array has two extra properties: index and input that you
> don't set.
>
> There are possible more extra things that need to happen that is not
> caused by Yarr at all...
>
> Now one of the important improvements your version got, is that it
> doesn't need to flatten the inputstring. I think.
> While that happens by default for Yarr. Now that's one of the benefits
> of using JS. It has more information.
> Another one is that we don't need to jump out of JS to C++ to Yarr Jit
> code again.
>
> Best Hannes
>
> On Sun, Jan 5, 2014 at 1:08 PM, Julian Viereck
> <[email protected]> wrote:
> > Hi Hannes,
> >
> > thanks a lot for your reply :)
> >
> >
> >> I'm not sure what you have tried. But I tried your hardcoded version.
> >
> > I tried to make my testing more transparent and uploaded my code on a
> GitHub
> > repo:
> >
> >  https://github.com/jviereck/regexp.js-octane
> >
> >
> >> Though I would suggest to try to run the numbers again, since the
> numbers
> >> differ so much from mine.
> >
> > Looking at the numbers, I think the numbers are fine if we assume you
> have a
> > more powerful PC that results in a score roughly 2x of my value by
> default.
> > Your score values before and after differ by ~200 points, while my do by
> > ~100 - so there is the 2x speed difference.
> >
> >
> >> we see 2 signatures in "Exec". So it is less specialized (not much, just
> >> an extra if to distinguish the paths at the "exec" call). I'm sure if
> all
> >> regexps would be transformed to "RegExpJS" we would get that back. It
> would
> >> only see 1 signature again.
> >
> > Thanks a lot for this hint! Based on this input, I have created a new
> > "Exec2" function, which is an exact copy of the "Exec" function, but the
> > "Exec2" function is only used for executing the re0 regular expression
> [1].
> > Using the hard coded RegExpJS function for re0 [2] resulted in these
> > numbers:
> >
> > before: 1582.7
> > (
> https://github.com/jviereck/regexp.js-octane/tree/e925606d0850b5c94d1622f7cfdcd2ab2c08e767
> )
> > after: 1632.7
> > (
> https://github.com/jviereck/regexp.js-octane/tree/0630eec8e656f3df5effc27114ba80ffe970d53e
> )
> >
> > These numbers are the average of 10 runs. There seems to be a speedup
> using
> > the hardcoded JS version.
> >
> > These results look more promising. However, they should be treated with
> care
> > as getting /^ba/ to work is quite simple and the implementation makes
> very
> > good to JS functions (e.g. String.prototype.startsWith), while a more
> > complicated example including backtracking might yield different results.
> >
> > Do you think it is worth to implement a hard coded version of the second
> > Octane tested regular expression:
> >
> >   var re1 =
> /(((\w+):\/\/)([^\/:]*)(:(\d+))?)?([^#?]*)(\?([^#]*))?(#(.*))?/;
> >
> > to see how good the performance can get?
> >
> > Best,
> >
> > - Julian
> >
> >
> > [1]:
> >
> https://github.com/jviereck/regexp.js-octane/commit/0d6e01d36a7d5dc24c385e3437e6b740dbd9da78#diff-0
> >
> > [2]:
> >
> https://github.com/jviereck/regexp.js-octane/commit/0630eec8e656f3df5effc27114ba80ffe970d53e
> >
> >
> > On 05/01/14 12:13, hv1989 wrote:
> >
> > Hi Julian,
> >
> > I'm not sure what you have tried. But I tried your hardcoded version.
> > (i.e. defining RegExpJS ourself, with the ^ba hack)
> >
> > - octane1.0-regexp:
> > before: 4510
> > after: 4658
> >
> > - octane2.0-regexp:
> > before: 2585
> > after: 2390
> >
> > So in octane1.0 that is indeed an improvement. For octane2.0 not and
> > that has a reason. In octane2.0 all calls to "exec()" have a wrapper:
> > "Exec()" that does some extra testing to make sure the result is
> > correct. Using TypeInformation we can find out this is only called
> > with "RegExp" as first parameter. So we can optimize that. Now with
> > "new RegExpJS(/^ba/);" we see 2 signatures in "Exec". So it is less
> > specialized (not much, just an extra if to distinguish the paths at
> > the "exec" call). I'm sure if all regexps would be transformed to
> > "RegExpJS" we would get that back. It would only see 1 signature
> > again.
> >
> > Now about RegExp.JS bringing such a big loss. That is possible. Yarr
> > isn't bad and in octane-regexp we only are stuck in the interpreter
> > for 3% and even in that case the interpreter isn't that slow. We
> > wouldn't win much on octane-regexp if we could JIT everything (what
> > the problem is for the other benchmarks like jQuery and Peacekeeper).
> > It will bring maximum a 4% gain for octane-regexp. Though I would
> > suggest to try to run the numbers again, since the numbers differ so
> > much from mine.
> >
> > Best Hannes
> >
> > On Sun, Jan 5, 2014 at 11:31 AM,  <[email protected]> wrote:
> >
> > On Thursday, January 2, 2014 6:47:58 PM UTC+1, Nicolas Pierron wrote:
> >
> > On 01/02/2014 07:31 AM, Nicolas B. Pierron wrote:
> >
> > I should have wrote that with a past tense …
> >
> > https://github.com/jviereck/regexp.js
> >
> > So far I hadn't done any performance numbers for RegExp.JS. I looked into
> > this and thanks to the help of Till I got the Octane benchmark running in
> > the JS shell [1].
> >
> > Before converting the entire Octane RegExp benchmark to run using
> RegExp.JS
> > I thought I just try the first RegExp tested in the benchmark. This means
> > the in terms of code changes:
> >
> >   diff --git a/regexp.js b/regexp.js
> >   - var re0 = /^ba/;
> >   + var re0 = new RegExpJS(/^ba/);
> >
> > Just changing this one RegExp caused the score from ~1480 on my machine
> to
> > drop to 77 (!!!) using the RegExp.JS library (& my.mood = :( ).
> >
> > Okay, so maybe RegExp.JS is doing something completely wrong, which is
> why I
> > tried another dump approach and defined:
> >
> >   function RegExpJS(reg) { }
> >
> >   RegExpJS.prototype.exec = function(str) {
> >     if (str.startsWith('ba')) {
> >         return ['ba'];
> >       } else {
> >         return null;
> >       }
> >   }
> >
> > This RegExpJS object ONLY works HARDCODED with the first regexp of the
> > octane benchmark (/^ba/) - cheating, I know, but let's see where this
> gets
> > us in terms of performance. Running the regexp.js benchmark with this
> > RegExpJS definition and the modification |var re0 = new RegExpJS(/^ba/);|
> > resulted in a score of ~1340. Better than 77, but still a huge drop
> compared
> > to 1480 by only changing one RegExp in the benchmark!
> >
> > (If you wonder if replacing the |if(str.startsWith('ba'))| call with |if
> > (str[0] == 'b' && str[1] == 'a') {| --- no, that doesn't make any
> difference
> > in terms of performance :/).
> >
> > ---
> >
> > Without knowing anything about the Spidermonkey JS internals, this very
> > small benchmarking raises the following questions to me:
> >
> > 1) Is the YARR implementation so much faster than anything written in
> plane
> > JS (even if the JS is highly optimized for the RegExp and matches the
> string
> > in the best optimial way)?
> > 2) Is there a performance bug in Spidermonkey, that makes even the plain
> > RegExpJS running only /^ba/ such slow?
> >
> >
> >
> > Cheers,
> >
> > - Julian
> >
> >
> >
> >
> > [1] Using the js shell provided at
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/dated
> > on the 04-Jan-2014 11:50.
> >
> >
> >
> > _______________________________________________
> > dev-tech-js-engine-internals mailing list
> > [email protected]
> > https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
> >
> >
> > --
> >
> > - Julian
> _______________________________________________
> dev-tech-js-engine-internals mailing list
> [email protected]
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>
_______________________________________________
dev-tech-js-engine-internals mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Reply via email to