Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

Julian Hyde Thu, 11 Jul 2024 12:04:34 -0700

> PS By coincidence, Aleksey is a colleague at Google, working on BigQuery 
> performance.


Oops. I was thinking of a different Alexey. I see that Aleksey
Shipilëv is at AWS.

Julian

On Thu, Jul 11, 2024 at 7:43 AM Julian Hyde <jhyde.apa...@gmail.com> wrote:
>
> Thanks for the article, Stamatis. I learned a lot.
>
> I believe that upgrading to JavaCC 7.0.5 (or higher) will solve the first 
> problem in Aleksey‘s article (stack trace construction) but not the second 
> (stack trace unwinding).
>
> JavaCC uses exceptions extensively in its lookahead logic. (There hundreds of 
> ‘catch (LookaheadSuccess)’ blocks in our code, and I would guess that we 
> invoke lookahead once per line in a typical query, but the number of frames 
> to be unwound each lookahead is very small.) I have to hope that the JavaCC 
> team have evaluated the performance implications.
>
> Julian
>
> PS By coincidence, Aleksey is a colleague at Google, working on BigQuery 
> performance.
>
> > On Jul 11, 2024, at 05:48, Stamatis Zampetakis <zabe...@gmail.com> wrote:
> >
> > This reminds me of a very nice benchmark done by Aleksey Shipilёv
> > around the performance implications of Exception [1]. Among other
> > things the benchmark analyses the impact of stack depth in the
> > performance of exceptions and is very similar to the parser benchmark
> > contributed by Cyril.
> >
> > The closing meme says everything: "If you didn't want exceptions to
> > ruin your performance you shouldn't have used them for the regular
> > control flow"
> >
> > Best,
> > Stamatis
> >
> > [1] https://shipilev.net/blog/2014/exceptional-performance/
> >
> >> On Thu, Jul 11, 2024 at 10:12 AM Cyril DE CATHEU
> >> <cy...@startree.ai.invalid> wrote:
> >>
> >> https://github.com/apache/calcite/pull/3853
> >>
> >>> On Wed, Jul 10, 2024 at 5:29 PM Julian Hyde <jhyde.apa...@gmail.com> 
> >>> wrote:
> >>>
> >>> PS Your method of increasing stack size by making recursive calls seems
> >>> valid. Your experiments show an improvement before and after the fix.
> >>> That’s good enough for me.
> >>>
> >>> I plan to use your benchmark to check that upgrading JavaCC doesn’t
> >>> introduce other performance problems.
> >>>
> >>> Julian
> >>>
> >>>> On Jul 10, 2024, at 08:18, Julian Hyde <jhyde.apa...@gmail.com> wrote:
> >>>>
> >>>> A PR with the benchmark would be great. If it is based on jmh I will
> >>> merge it to main, so others can run the benchmark in future, and maybe do
> >>> more performance optimizations on the parser.
> >>>>
> >>>> By the way, it can be on whichever parser variant (core, babel, server)
> >>> is most convenient. I’m pretty sure they have the same performance issues.
> >>>>
> >>>> I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a
> >>> PR ready soon.
> >>>>
> >>>> Julian
> >>>>
> >>>>> On Jul 10, 2024, at 03:54, Cyril DE CATHEU <cy...@startree.ai.invalid>
> >>> wrote:
> >>>>>
> >>>>> Hey Mihai and Julian,
> >>>>>
> >>>>> Thanks for looking at this.
> >>>>> The performance hit depends on the size of the stack.
> >>>>> Here are the numbers I have for the moment: (*not run in a good
> >>> benchmark
> >>>>> setup* but gives an idea)
> >>>>>
> >>>>> Benchmark                                                          Mode
> >>>>> Cnt   Score   Error  Units
> >>>>> BabelParserBenchmark.instantiateBabelParser                        avgt
> >>>>> 7  14.128 ± 0.041  us/op
> >>>>> BabelParserBenchmark.instantiateBabelParserErrorFixed              avgt
> >>>>> 7  12.054 ± 0.039  us/op
> >>>>>
> >>>>> BabelParserBenchmark.instantiateBabelParserBigCallStack            avgt
> >>>>> 7  20.576 ± 0.069  us/op
> >>>>> BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed  avgt
> >>>>> 7  12.479 ± 0.068  us/op
> >>>>>
> >>>>> For the smallest error stack possible with JMH, the current version is
> >>> ~15%
> >>>>> slower than with the fix.
> >>>>> With a bigger stack - instantiation that happens on depth ~100 - the
> >>>>> current version gets even slower. With the fix the time is stable.
> >>>>>
> >>>>> Note I'm not sure my method to make the stack bigger is realistic. I
> >>> just
> >>>>> do some recursive calls.
> >>>>> Also to fix the issue I just copied the SqlBabelParserImpl and fixed the
> >>>>> stack trace call, so the results above may not match exactly the change
> >>>>> that will be observed by upgrading to 7.0.13.
> >>>>>
> >>>>> I can create a PR that adds the benchmark (without the hotfix
> >>> *ErrorFixed
> >>>>> one) if it's okay for you.
> >>>>>
> >>>>>>> On Tue, Jul 9, 2024 at 4:07 AM Julian Hyde <jh...@apache.org> wrote:
> >>>>>>
> >>>>>> I've started work on
> >>>>>> https://issues.apache.org/jira/browse/CALCITE-5541, to upgrade to
> >>>>>> 7.0.13. There are still a few errors, but the inefficiency in the
> >>>>>> subclass of Error seems to have been solved.
> >>>>>>
> >>>>>> Can someone work on the microbenchmark? I would like to see numbers
> >>>>>> before and after this fix.
> >>>>>>
> >>>>>> Julian
> >>>>>>
> >>>>>>
> >>>>>>> On Mon, Jul 8, 2024 at 1:51 PM Mihai Budiu <mbu...@gmail.com> wrote:
> >>>>>>>
> >>>>>>> There is an issue to upgrade JavaCC:
> >>>>>> https://issues.apache.org/jira/browse/CALCITE-5541
> >>>>>>>
> >>>>>>> It may be a worthwhile effort to do it.
> >>>>>>>
> >>>>>>> Mihai
> >>>>>>> ________________________________
> >>>>>>> From: Cyril DE CATHEU <cy...@startree.ai.INVALID>
> >>>>>>> Sent: Friday, July 5, 2024 8:55 AM
> >>>>>>> To: dev@calcite.apache.org <dev@calcite.apache.org>
> >>>>>>> Subject: Re: SqlBabelParserImpl instantiation is slow because it
> >>>>>> instantiates a java.lang.Error
> >>>>>>>
> >>>>>>> Me again,
> >>>>>>>
> >>>>>>> So this LookaheadSuccess comes from JavaCC.
> >>>>>>> It seems this issue was solved in JavaCC some time ago, from JavaCC
> >>>>>> 7.0.5:
> >>>>>>> https://github.com/javacc/javacc/pull/99
> >>>>>>>
> >>>>>>
> >>> https://github.com/javacc/javacc/blob/46aa917fda0d0726690e384632e5cefae46bab68/docs/release-notes.md?plain=1#L163
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jul 5, 2024 at 5:44 PM Cyril DE CATHEU <cy...@startree.ai
> >>>>>> <mailto:cy...@startree.ai>> wrote:
> >>>>>>> Had a deeper look, it seems the LookaheadSuccess instance is exploited
> >>>>>> for conditional branching
> >>>>>>>
> >>>>>>> Eg:
> >>>>>>>
> >>>>>>> try { return !jj_3_2(); }
> >>>>>>> catch(LookaheadSuccess ls) { return true; }
> >>>>>>> finally { jj_save(1, xla); }
> >>>>>>>
> >>>>>>> So I guess lazy instantiation will not help because the
> >>> LookaheadSuccess
> >>>>>> will be instantiated anyway.
> >>>>>>> If we know this error is always caught internally in the class, could
> >>> we
> >>>>>> override the constructor to not do a call to
> >>> Throwable.fillInStackTrace();
> >>>>>> ? which is what makes this slow.
> >>>>>>>
> >>>>>>> On Fri, Jul 5, 2024 at 10:53 AM Cyril DE CATHEU <cy...@startree.ai
> >>>>>> <mailto:cy...@startree.ai>> wrote:
> >>>>>>> Hey,
> >>>>>>>
> >>>>>>> I'm using Calcite to parse snippets of SQL.
> >>>>>>> My (simplified) function looks like this:
> >>>>>>>
> >>>>>>> public static SqlNode expressionToNode(final String sqlExpression,
> >>>>>>>  final SqlParser.Config config) {
> >>>>>>> SqlParser sqlParser = SqlParser.create(sqlExpression, config);
> >>>>>>> try {
> >>>>>>>  return sqlParser.parseExpression();
> >>>>>>> } catch (SqlParseException e) {
> >>>>>>>   //do something
> >>>>>>> }
> >>>>>>> }
> >>>>>>>
> >>>>>>> and it is called many times.
> >>>>>>> The parser set in the config is Babel, so the parser instantiated is
> >>>>>> SqlBabelParserImpl.
> >>>>>>> A SqlBabelParserImpl is instantiated every time this function is
> >>> called.
> >>>>>> From what I understand this parser cannot be re-used easily to parse
> >>>>>> different expressions but let me know if I'm incorrect.
> >>>>>>>
> >>>>>>> The issue I'm encountering is the instantiation of this class is
> >>> pretty
> >>>>>> slow because one of the instance field  jj_ls extends java.lang.Error
> >>> and
> >>>>>> is instantiated when the SqlBabelParserImpl is instantiated.
> >>>>>>>
> >>>>>>> [Screenshot 2024-07-05 at 10.37.17.png]
> >>>>>>>
> >>>>>>> here is the extract of the generated code in SqlBabelParserImpl:
> >>>>>>>
> >>>>>>> static private final class LookaheadSuccess extends java.lang.Error {
> >>> }
> >>>>>>>
> >>>>>>> final private LookaheadSuccess jj_ls = new LookaheadSuccess();
> >>>>>>>
> >>>>>>> final private boolean jj_scan_token(int kind) {
> >>>>>>> if (jj_scanpos == jj_lastpos) {
> >>>>>>>  jj_la--;
> >>>>>>>  if (jj_scanpos.next == null) {
> >>>>>>>    jj_lastpos = jj_scanpos = jj_scanpos.next =
> >>>>>> token_source.getNextToken();
> >>>>>>>  } else {
> >>>>>>>    jj_lastpos = jj_scanpos = jj_scanpos.next;
> >>>>>>>  }
> >>>>>>> } else {
> >>>>>>>  jj_scanpos = jj_scanpos.next;
> >>>>>>> }
> >>>>>>> if (jj_rescan) {
> >>>>>>>  int i = 0; Token tok = token;
> >>>>>>>  while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }
> >>>>>>>  if (tok != null) jj_add_error_token(kind, i);
> >>>>>>> }
> >>>>>>> if (jj_scanpos.kind != kind) return true;
> >>>>>>> if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;
> >>>>>>> return false;
> >>>>>>> }
> >>>>>>>
> >>>>>>>
> >>>>>>> jj_ls is only used in an error case. (before last line of
> >>> jj_scan_token)
> >>>>>>> I'm assuming the re-use of a single error instance is done on purpose,
> >>>>>>> but could we consider instantiate jj_ls lazily?
> >>>>>>>
> >>>>>>> I can try to do this but I'm a bit lost in the code generation
> >>> codebase
> >>>>>> so I would need some pointers.
> >>>>>>>
> >>>>>>> Have a nice day.
> >>>>>>> --
> >>>>>>>
> >>>>>>> [StarTree] <https://startree.ai>
> >>>>>>> Cyril de Catheu
> >>>>>>> Software Engineer
> >>>>>>> +33 (684) 829-908<tel:+33+(684)+829-908>
> >>>>>>> Follow us: [RSS] <https://www.startree.ai/blogs> [LinkedIn] <
> >>>>>> https://www.linkedin.com/company/startreedata/> [Twitter] <
> >>>>>> https://twitter.com/startreedata> [Slack] <https://stree.ai/slack>
> >>>>>> [YouTube] <https://youtube.com/StarTreeData>
> >>>>>>>
> >>>>>>> [Try StarTree Cloud Today]<
> >>>>>>
> >>> https://get.startree.ai/startree-cloud?utm_campaign=byoc-edition-of-startree-cloud&utm_source=email&utm_content=startree-employee-email-signatures
> >>>>>>>
> >>>>>>
> >>>

Re: SqlBabelParserImpl instantiation is slow because it instantiates a java.lang.Error

Reply via email to