A PR with the benchmark would be great. If it is based on jmh I will merge it to main, so others can run the benchmark in future, and maybe do more performance optimizations on the parser.
By the way, it can be on whichever parser variant (core, babel, server) is most convenient. I’m pretty sure they have the same performance issues. I’m continuing to work on upgrading JavaCC to 7.0.13, and should have a PR ready soon. Julian > On Jul 10, 2024, at 03:54, Cyril DE CATHEU <cy...@startree.ai.invalid> wrote: > > Hey Mihai and Julian, > > Thanks for looking at this. > The performance hit depends on the size of the stack. > Here are the numbers I have for the moment: (*not run in a good benchmark > setup* but gives an idea) > > Benchmark Mode > Cnt Score Error Units > BabelParserBenchmark.instantiateBabelParser avgt > 7 14.128 ± 0.041 us/op > BabelParserBenchmark.instantiateBabelParserErrorFixed avgt > 7 12.054 ± 0.039 us/op > > BabelParserBenchmark.instantiateBabelParserBigCallStack avgt > 7 20.576 ± 0.069 us/op > BabelParserBenchmark.instantiateBabelParserBigCallStackErrorFixed avgt > 7 12.479 ± 0.068 us/op > > For the smallest error stack possible with JMH, the current version is ~15% > slower than with the fix. > With a bigger stack - instantiation that happens on depth ~100 - the > current version gets even slower. With the fix the time is stable. > > Note I'm not sure my method to make the stack bigger is realistic. I just > do some recursive calls. > Also to fix the issue I just copied the SqlBabelParserImpl and fixed the > stack trace call, so the results above may not match exactly the change > that will be observed by upgrading to 7.0.13. > > I can create a PR that adds the benchmark (without the hotfix *ErrorFixed > one) if it's okay for you. > >> On Tue, Jul 9, 2024 at 4:07 AM Julian Hyde <jh...@apache.org> wrote: >> >> I've started work on >> https://issues.apache.org/jira/browse/CALCITE-5541, to upgrade to >> 7.0.13. There are still a few errors, but the inefficiency in the >> subclass of Error seems to have been solved. >> >> Can someone work on the microbenchmark? I would like to see numbers >> before and after this fix. >> >> Julian >> >> >>> On Mon, Jul 8, 2024 at 1:51 PM Mihai Budiu <mbu...@gmail.com> wrote: >>> >>> There is an issue to upgrade JavaCC: >> https://issues.apache.org/jira/browse/CALCITE-5541 >>> >>> It may be a worthwhile effort to do it. >>> >>> Mihai >>> ________________________________ >>> From: Cyril DE CATHEU <cy...@startree.ai.INVALID> >>> Sent: Friday, July 5, 2024 8:55 AM >>> To: dev@calcite.apache.org <dev@calcite.apache.org> >>> Subject: Re: SqlBabelParserImpl instantiation is slow because it >> instantiates a java.lang.Error >>> >>> Me again, >>> >>> So this LookaheadSuccess comes from JavaCC. >>> It seems this issue was solved in JavaCC some time ago, from JavaCC >> 7.0.5: >>> https://github.com/javacc/javacc/pull/99 >>> >> https://github.com/javacc/javacc/blob/46aa917fda0d0726690e384632e5cefae46bab68/docs/release-notes.md?plain=1#L163 >>> >>> >>> >>> On Fri, Jul 5, 2024 at 5:44 PM Cyril DE CATHEU <cy...@startree.ai >> <mailto:cy...@startree.ai>> wrote: >>> Had a deeper look, it seems the LookaheadSuccess instance is exploited >> for conditional branching >>> >>> Eg: >>> >>> try { return !jj_3_2(); } >>> catch(LookaheadSuccess ls) { return true; } >>> finally { jj_save(1, xla); } >>> >>> So I guess lazy instantiation will not help because the LookaheadSuccess >> will be instantiated anyway. >>> If we know this error is always caught internally in the class, could we >> override the constructor to not do a call to Throwable.fillInStackTrace(); >> ? which is what makes this slow. >>> >>> On Fri, Jul 5, 2024 at 10:53 AM Cyril DE CATHEU <cy...@startree.ai >> <mailto:cy...@startree.ai>> wrote: >>> Hey, >>> >>> I'm using Calcite to parse snippets of SQL. >>> My (simplified) function looks like this: >>> >>> public static SqlNode expressionToNode(final String sqlExpression, >>> final SqlParser.Config config) { >>> SqlParser sqlParser = SqlParser.create(sqlExpression, config); >>> try { >>> return sqlParser.parseExpression(); >>> } catch (SqlParseException e) { >>> //do something >>> } >>> } >>> >>> and it is called many times. >>> The parser set in the config is Babel, so the parser instantiated is >> SqlBabelParserImpl. >>> A SqlBabelParserImpl is instantiated every time this function is called. >> From what I understand this parser cannot be re-used easily to parse >> different expressions but let me know if I'm incorrect. >>> >>> The issue I'm encountering is the instantiation of this class is pretty >> slow because one of the instance field jj_ls extends java.lang.Error and >> is instantiated when the SqlBabelParserImpl is instantiated. >>> >>> [Screenshot 2024-07-05 at 10.37.17.png] >>> >>> here is the extract of the generated code in SqlBabelParserImpl: >>> >>> static private final class LookaheadSuccess extends java.lang.Error { } >>> >>> final private LookaheadSuccess jj_ls = new LookaheadSuccess(); >>> >>> final private boolean jj_scan_token(int kind) { >>> if (jj_scanpos == jj_lastpos) { >>> jj_la--; >>> if (jj_scanpos.next == null) { >>> jj_lastpos = jj_scanpos = jj_scanpos.next = >> token_source.getNextToken(); >>> } else { >>> jj_lastpos = jj_scanpos = jj_scanpos.next; >>> } >>> } else { >>> jj_scanpos = jj_scanpos.next; >>> } >>> if (jj_rescan) { >>> int i = 0; Token tok = token; >>> while (tok != null && tok != jj_scanpos) { i++; tok = tok.next; } >>> if (tok != null) jj_add_error_token(kind, i); >>> } >>> if (jj_scanpos.kind != kind) return true; >>> if (jj_la == 0 && jj_scanpos == jj_lastpos) throw jj_ls; >>> return false; >>> } >>> >>> >>> jj_ls is only used in an error case. (before last line of jj_scan_token) >>> I'm assuming the re-use of a single error instance is done on purpose, >>> but could we consider instantiate jj_ls lazily? >>> >>> I can try to do this but I'm a bit lost in the code generation codebase >> so I would need some pointers. >>> >>> Have a nice day. >>> -- >>> >>> [StarTree] <https://startree.ai> >>> Cyril de Catheu >>> Software Engineer >>> +33 (684) 829-908<tel:+33+(684)+829-908> >>> Follow us: [RSS] <https://www.startree.ai/blogs> [LinkedIn] < >> https://www.linkedin.com/company/startreedata/> [Twitter] < >> https://twitter.com/startreedata> [Slack] <https://stree.ai/slack> >> [YouTube] <https://youtube.com/StarTreeData> >>> >>> [Try StarTree Cloud Today]< >> https://get.startree.ai/startree-cloud?utm_campaign=byoc-edition-of-startree-cloud&utm_source=email&utm_content=startree-employee-email-signatures >>> >>