https://github.com/apache/calcite/pull/3853

On Wed, Jul 10, 2024 at 5:29 PM Julian Hyde <jhyde.apa...@gmail.com> wrote:

> PS Your method of increasing stack size by making recursive calls seems
> valid. Your experiments show an improvement before and after the fix.
> That’s good enough for me.
>
> I plan to use your benchmark to check that upgrading JavaCC doesn’t
> introduce other performance problems.
>
> Julian
>
> > On Jul 10, 2024, at 08:18, Julian Hyde <jhyde.apa...@gmail.com> wrote:
> >
> > A PR with the benchmark would be great. If it is based on jmh I will
> merge it to main, so others can run the benchmark in future, and maybe do
> more performance optimizations on the parser.
> >
> > By the way, it can be on whichever parser variant (core, babel, server)
> is most convenient. I’m pretty sure they have the same performance issues.
> >
> > I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a
> PR ready soon.
> >
> > Julian
> >
> >> On Jul 10, 2024, at 03:54, Cyril DE CATHEU <cy...@startree.ai.invalid>
> wrote:
> >>
> >> Hey Mihai and Julian,
> >>
> >> Thanks for looking at this.
> >> The performance hit depends on the size of the stack.
> >> Here are the numbers I have for the moment: (*not run in a good
> benchmark
> >> setup* but gives an idea)
> >>
> >> Benchmark                                                          Mode
> >> Cnt   Score   Error  Units
> >> BabelParserBenchmark.instantiateBabelParser                        avgt
> >> 7  14.128 ± 0.041  us/op
> >> BabelParserBenchmark.instantiateBabelParserErrorFixed              avgt
> >> 7  12.054 ± 0.039  us/op
> >>
> >> BabelParserBenchmark.instantiateBabelParserBigCallStack            avgt
> >> 7  20.576 ± 0.069  us/op
> >> BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed  avgt
> >> 7  12.479 ± 0.068  us/op
> >>
> >> For the smallest error stack possible with JMH, the current version is
> ~15%
> >> slower than with the fix.
> >> With a bigger stack - instantiation that happens on depth ~100 - the
> >> current version gets even slower. With the fix the time is stable.
> >>
> >> Note I'm not sure my method to make the stack bigger is realistic. I
> just
> >> do some recursive calls.
> >> Also to fix the issue I just copied the SqlBabelParserImpl and fixed the
> >> stack trace call, so the results above may not match exactly the change
> >> that will be observed by upgrading to 7.0.13.
> >>
> >> I can create a PR that adds the benchmark (without the hotfix
> *ErrorFixed
> >> one) if it's okay for you.
> >>
> >>>> On Tue, Jul 9, 2024 at 4:07 AM Julian Hyde <jh...@apache.org> wrote:
> >>>
> >>> I've started work on
> >>> https://issues.apache.org/jira/browse/CALCITE-5541, to upgrade to
> >>> 7.0.13. There are still a few errors, but the inefficiency in the
> >>> subclass of Error seems to have been solved.
> >>>
> >>> Can someone work on the microbenchmark? I would like to see numbers
> >>> before and after this fix.
> >>>
> >>> Julian
> >>>
> >>>
> >>>> On Mon, Jul 8, 2024 at 1:51 PM Mihai Budiu <mbu...@gmail.com> wrote:
> >>>>
> >>>> There is an issue to upgrade JavaCC:
> >>> https://issues.apache.org/jira/browse/CALCITE-5541
> >>>>
> >>>> It may be a worthwhile effort to do it.
> >>>>
> >>>> Mihai
> >>>> ________________________________
> >>>> From: Cyril DE CATHEU <cy...@startree.ai.INVALID>
> >>>> Sent: Friday, July 5, 2024 8:55 AM
> >>>> To: dev@calcite.apache.org <dev@calcite.apache.org>
> >>>> Subject: Re: SqlBabelParserImpl instantiation is slow because it
> >>> instantiates a java.lang.Error
> >>>>
> >>>> Me again,
> >>>>
> >>>> So this LookaheadSuccess comes from JavaCC.
> >>>> It seems this issue was solved in JavaCC some time ago, from JavaCC
> >>> 7.0.5:
> >>>> https://github.com/javacc/javacc/pull/99
> >>>>
> >>>
> https://github.com/javacc/javacc/blob/46aa917fda0d0726690e384632e5cefae46bab68/docs/release-notes.md?plain=1#L163
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 5, 2024 at 5:44 PM Cyril DE CATHEU <cy...@startree.ai
> >>> <mailto:cy...@startree.ai>> wrote:
> >>>> Had a deeper look, it seems the LookaheadSuccess instance is exploited
> >>> for conditional branching
> >>>>
> >>>> Eg:
> >>>>
> >>>> try { return !jj_3_2(); }
> >>>> catch(LookaheadSuccess ls) { return true; }
> >>>> finally { jj_save(1, xla); }
> >>>>
> >>>> So I guess lazy instantiation will not help because the
> LookaheadSuccess
> >>> will be instantiated anyway.
> >>>> If we know this error is always caught internally in the class, could
> we
> >>> override the constructor to not do a call to
> Throwable.fillInStackTrace();
> >>> ? which is what makes this slow.
> >>>>
> >>>> On Fri, Jul 5, 2024 at 10:53 AM Cyril DE CATHEU <cy...@startree.ai
> >>> <mailto:cy...@startree.ai>> wrote:
> >>>> Hey,
> >>>>
> >>>> I'm using Calcite to parse snippets of SQL.
> >>>> My (simplified) function looks like this:
> >>>>
> >>>> public static SqlNode expressionToNode(final String sqlExpression,
> >>>>   final SqlParser.Config config) {
> >>>> SqlParser sqlParser = SqlParser.create(sqlExpression, config);
> >>>> try {
> >>>>   return sqlParser.parseExpression();
> >>>> } catch (SqlParseException e) {
> >>>>    //do something
> >>>> }
> >>>> }
> >>>>
> >>>> and it is called many times.
> >>>> The parser set in the config is Babel, so the parser instantiated is
> >>> SqlBabelParserImpl.
> >>>> A SqlBabelParserImpl is instantiated every time this function is
> called.
> >>> From what I understand this parser cannot be re-used easily to parse
> >>> different expressions but let me know if I'm incorrect.
> >>>>
> >>>> The issue I'm encountering is the instantiation of this class is
> pretty
> >>> slow because one of the instance field  jj_ls extends java.lang.Error
> and
> >>> is instantiated when the SqlBabelParserImpl is instantiated.
> >>>>
> >>>> [Screenshot 2024-07-05 at 10.37.17.png]
> >>>>
> >>>> here is the extract of the generated code in SqlBabelParserImpl:
> >>>>
> >>>> static private final class LookaheadSuccess extends java.lang.Error {
> }
> >>>>
> >>>> final private LookaheadSuccess jj_ls = new LookaheadSuccess();
> >>>>
> >>>> final private boolean jj_scan_token(int kind) {
> >>>> if (jj_scanpos == jj_lastpos) {
> >>>>   jj_la--;
> >>>>   if (jj_scanpos.next == null) {
> >>>>     jj_lastpos = jj_scanpos = jj_scanpos.next =
> >>> token_source.getNextToken();
> >>>>   } else {
> >>>>     jj_lastpos = jj_scanpos = jj_scanpos.next;
> >>>>   }
> >>>> } else {
> >>>>   jj_scanpos = jj_scanpos.next;
> >>>> }
> >>>> if (jj_rescan) {
> >>>>   int i = 0; Token tok = token;
> >>>>   while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; }
> >>>>   if (tok != null) jj_add_error_token(kind, i);
> >>>> }
> >>>> if (jj_scanpos.kind != kind) return true;
> >>>> if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls;
> >>>> return false;
> >>>> }
> >>>>
> >>>>
> >>>> jj_ls is only used in an error case. (before last line of
> jj_scan_token)
> >>>> I'm assuming the re-use of a single error instance is done on purpose,
> >>>> but could we consider instantiate jj_ls lazily?
> >>>>
> >>>> I can try to do this but I'm a bit lost in the code generation
> codebase
> >>> so I would need some pointers.
> >>>>
> >>>> Have a nice day.
> >>>> --
> >>>>
> >>>> [StarTree] <https://startree.ai>
> >>>> Cyril de Catheu
> >>>> Software Engineer
> >>>> +33 (684) 829-908<tel:+33+(684)+829-908>
> >>>> Follow us: [RSS] <https://www.startree.ai/blogs> [LinkedIn] <
> >>> https://www.linkedin.com/company/startreedata/> [Twitter] <
> >>> https://twitter.com/startreedata> [Slack] <https://stree.ai/slack>
> >>> [YouTube] <https://youtube.com/StarTreeData>
> >>>>
> >>>> [Try StarTree Cloud Today]<
> >>>
> https://get.startree.ai/startree-cloud?utm_campaign=byoc-edition-of-startree-cloud&utm_source=email&utm_content=startree-employee-email-signatures
> >>>>
> >>>
>

Reply via email to