Or TL;DR: If your parser is creaking, don’t blame the parser, simplify your grammar.
> On Mar 31, 2019, at 2:56 PM, Julian Hyde <[email protected]> wrote: > > Welcome the world of LL parsers (e.g. JavaCC and Antlr). They are less > powerful than LR parsers (e.g. yacc) because they have to know that a > production is going to succeed before they jump into it, but they are easier > to use, and tend to produce human-readable error messages. > > Sometimes you can use a bit of extra lookahead — i.e. read a small, fixed > number of tokens — to make the decision before you jump, but quite often you > can’t. > > Sometimes semantic lookahead (i.e. try to parse and backtrack if you can’t) > is touted as the solution but it can have truly awful performance. (During > the week I bumped into https://issues.apache.org/jira/browse/HIVE-18624 > <https://issues.apache.org/jira/browse/HIVE-18624>, where the running time is > literally exponential in the number of parentheses.) > > The best solution is to create your own non-deterministic states, e.g. the > state of having just seen a parentheses but not knowing yet whether it was an > expression or a sub-query. It is not easy to code, but it works. You’re > basically emulating what an LR parser would do. > > Julian > > > >> On Mar 31, 2019, at 2:43 PM, Muhammad Gelbana <[email protected] >> <mailto:[email protected]>> wrote: >> >> I think upgrading JavaCC maven plugin could break things if: >> 1. The grammar changed, and I highly doubt that, or, >> 2. We have bugs in our grammar that were fixed in the upgraded JavaCC >> library. >> >> Thanks, >> Gelbana >> >> >> On Sun, Mar 31, 2019 at 11:41 PM Muhammad Gelbana <[email protected] >> <mailto:[email protected]>> >> wrote: >> >>> Thanks for the suggestion Stamatis, but that didn't work for me. It caused >>> compilation errors in SqlParserImpl and I couldn't see a way to resolve >>> them. >>> >>> Thanks, >>> Gelbana >>> >>> >>> On Sun, Mar 31, 2019 at 6:01 PM Stamatis Zampetakis <[email protected] >>> <mailto:[email protected]>> >>> wrote: >>> >>>> I am not sure why the update breaks so many tests neither if there is a >>>> problem with the LOOKAHEAD but regarding CALCITE-2844, I would be inclined >>>> to modify lines around [3] to make it work. >>>> >>>> In particular, I would try to make <TABLE> with parentheses optional (just >>>> in case you didn't try this so far). >>>> >>>> Best, >>>> Stamatis >>>> >>>> [3] >>>> >>>> https://github.com/apache/calcite/blob/81fa5314e94e86b6cf8df244b03f9d57c884f54d/core/src/main/codegen/templates/Parser.jj#L1884 >>>> >>>> <https://github.com/apache/calcite/blob/81fa5314e94e86b6cf8df244b03f9d57c884f54d/core/src/main/codegen/templates/Parser.jj#L1884> >>>> >>>> Στις Κυρ, 31 Μαρ 2019 στις 5:16 μ.μ., ο/η Muhammad Gelbana < >>>> [email protected]> έγραψε: >>>> >>>>> I was trying to support selecting from table functions[1]. I tried >>>>> extending TableRef2[2] (Production ?) to support table functions by >>>> adding >>>>> >>>>>> LOOKAHEAD(3) >>>>>> >>>>> tableRef = TableFunctionCall(getPos())) >>>>>> >>>>> | >>>>>> >>>>> before >>>>> >>>>>> LOOKAHEAD(2) >>>>>> tableRef = CompoundIdentifier() >>>>>> >>>>> >>>>> but it broke other tests. I tried putting my modification at the end of >>>> the >>>>> choices while increasing the CompoundIdentifier() lookahead to 3 to >>>> avoid >>>>> that choice when it faces the left bracket, but it didn't work too. I >>>> tried >>>>> setting aggresively high lookahead values such as 50, and it didn't work >>>>> too. I won't be surprised if I'm doing anything wrong as I'm not >>>> accustomed >>>>> to working with grammar files anyway. >>>>> >>>>> The only thing I'm considering now is to create a new production (I'm >>>> not >>>>> sure if I'm using this word correctly) such as TableRef3 and have that >>>>> going down the common path between TableFunctionCall() and >>>>> CompoundIdentifier() because TableFunctionCall() eventually attempts to >>>>> cosnume a CompoundIdentifier(). This way I won't have to bother about >>>>> tuning lookaheads I suppose. >>>>> >>>>> I can create a branch of what I've accomplished so far if you wish. >>>>> >>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2844 >>>>> [2] >>>>> >>>>> >>>> https://github.com/apache/calcite/blob/master/core/src/main/codegen/templates/Parser.jj#L1811 >>>>> >>>>> Thanks, >>>>> Gelbana >>>>> >>>>> >>>>> On Sun, Mar 31, 2019 at 4:15 PM Hongze Zhang <[email protected]> wrote: >>>>> >>>>>> Just out of my curiosity, could you please share your case about >>>>>> "LOOKAHEAD doest not work as expected"? Does changing to JavaCC 5.0 >>>>>> actually fixes the problem? >>>>>> >>>>>> Thanks, >>>>>> Hongze >>>>>> >>>>>> >>>>>>> On Mar 31, 2019, at 19:17, Muhammad Gelbana <[email protected]> >>>>> wrote: >>>>>>> >>>>>>> I'm facing trouble with supporting selecting from table function for >>>>>> Babel >>>>>>> parser and I beleive that LOOKAHEAD isn't working as expected too. >>>>>>> I thought it might actually be a bug so I checked out the master >>>> branch >>>>>> and >>>>>>> updated the JavaCC maven plugin version to 2.6 (it's currently 2.4), >>>>> but >>>>>>> that failed *142* test cases and errored *9*. >>>>>>> >>>>>>> The plugin v2.4 imports the JavaCC library v4 >>>>>>> The plugin v2.6 imports the JavaCC library v5 >>>>>>> >>>>>>> Unfortunately the release notes for the JavaCC library are broken >>>> and >>>>> I'm >>>>>>> not aware of another source for the release notes for that project. >>>>>>> Should I open a Jira to upgrade that plugin version ? >>>>>>> >>>>>>> Thanks, >>>>>>> Gelbana >>>>>>> >>>>>>> >>>>>>> On Thu, Mar 28, 2019 at 4:18 AM Rui Li <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> Thanks Hongze, that's good to know. >>>>>>>> >>>>>>>> On Thu, Mar 28, 2019 at 8:43 AM Hongze Zhang <[email protected]> >>>>> wrote: >>>>>>>> >>>>>>>>>> Besides, if I enable forceLaCheck, JavaCC suggests to use a >>>>> lookahead >>>>>>>> of >>>>>>>>> 3 >>>>>>>>>> or more. I guess we'd better get rid of these warnings if we >>>> want to >>>>>>>>> stick >>>>>>>>>> to lookahead(2). >>>>>>>>> >>>>>>>>> That makes sense. Actually we had a discussion[1] on moving to >>>>>>>>> "LOOKAHEAD=1", and seems we are close to finish it. By doing this >>>> we >>>>>> have >>>>>>>>> extra benefits that we don't need to turn forceLaCheck on and >>>> JavaCC >>>>>>>> should >>>>>>>>> give suggestions during maven build. >>>>>>>>> >>>>>>>>> Hongze >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2847 >>>>>>>>> >>>>>>>>>> On Mar 27, 2019, at 10:40, Rui Li <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> Thanks Hongze for looking into the issue! Are you suggesting >>>> this is >>>>>>>> more >>>>>>>>>> likely to be a JavaCC bug? >>>>>>>>>> I filed a ticket anyway in case we want to further track it: >>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2957 >>>>>>>>>> Besides, if I enable forceLaCheck, JavaCC suggests to use a >>>>> lookahead >>>>>>>> of >>>>>>>>> 3 >>>>>>>>>> or more. I guess we'd better get rid of these warnings if we >>>> want to >>>>>>>>> stick >>>>>>>>>> to lookahead(2). >>>>>>>>>> >>>>>>>>>> On Wed, Mar 27, 2019 at 8:54 AM Hongze Zhang <[email protected]> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thanks, Yuzhao. >>>>>>>>>>> >>>>>>>>>>> Since the more generic problem is that the production "E()"[1] >>>>> causes >>>>>>>>> the >>>>>>>>>>> parent production's looking ahead returns too early, I tried to >>>>> find >>>>>> a >>>>>>>>> bad >>>>>>>>>>> case of the same reason under current default setting >>>> LOOKAHEAD=2 >>>>> but >>>>>>>> it >>>>>>>>>>> seems that under this number we didn't have a chance to meet the >>>>>> issue >>>>>>>>> yet. >>>>>>>>>>> >>>>>>>>>>> So after that I suggest to not to treat this as a Calcite's >>>> issue >>>>>>>>>>> currently. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Hongze >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> https://github.com/apache/calcite/blob/11c067f9992d9c8bc29e2326dd8b299ad1e9dbdc/core/src/main/codegen/templates/Parser.jj#L335 >>>>>>>>>>> >>>>>>>>>>>> On Mar 26, 2019, at 20:42, Yuzhao Chen <[email protected]> >>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Maybe we should fire a jira if it is a bug. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Danny Chan >>>>>>>>>>>> 在 2019年3月26日 +0800 PM8:33,Hongze Zhang <[email protected]>,写道: >>>>>>>>>>>>> Ops, correct a typo: >>>>>>>>>>>>> >>>>>>>>>>>>> "... after uncommenting a line ..." -> "... after commenting a >>>>> line >>>>>>>>>>>>> ...". >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Hongze >>>>>>>>>>>>> >>>>>>>>>>>>> ------ Original Message ------ >>>>>>>>>>>>> From: "Hongze Zhang" <[email protected]> >>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>> Sent: 2019/3/26 19:28:08 >>>>>>>>>>>>> Subject: Re: Calcite doesn't work with LOOKAHEAD(3) >>>>>>>>>>>>> >>>>>>>>>>>>>> Firstly, thank you very much for sharing the case, Rui! >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have run a test with the SQL you provided and also run into >>>>> the >>>>>>>>> same >>>>>>>>>>> exception (under a global LOOKAHEAD 3). After debugging the >>>>> generated >>>>>>>>>>> parser code, I think the problem is probably in the generated >>>>>>>> LOOKAHEAD >>>>>>>>>>> method SqlParserImpl#jj_3R_42(): >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> final private boolean jj_3R_42() { >>>>>>>>>>>>>>> if (!jj_rescan) trace_call("SqlSelect(LOOKING AHEAD...)"); >>>>>>>>>>>>>>> if (jj_scan_token(SELECT)) { if (!jj_rescan) >>>>>>>>>>> trace_return("SqlSelect(LOOKAHEAD FAILED)"); return true; } >>>>>>>>>>>>>>> if (jj_3R_190()) { if (!jj_rescan) >>>>>>>> trace_return("SqlSelect(LOOKAHEAD >>>>>>>>>>> FAILED)"); return true; } >>>>>>>>>>>>>>> { if (!jj_rescan) trace_return("SqlSelect(LOOKAHEAD >>>>> SUCCEEDED)"); >>>>>>>>>>> return false; } >>>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> The LOOKAHEAD method checks only a single token <SELECT>. >>>> This >>>>> is >>>>>>>>>>> definitely not enough since we have already set the number to 3. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Unfortunately I didn't find a root cause so far, but after >>>>>>>>>>> uncommenting a line[1] in production "SqlSelect()" then >>>> everything >>>>>>>> goes >>>>>>>>>>> back to normal. I'm inclined to believe JavaCC has some >>>> unexpected >>>>>>>>> behavior >>>>>>>>>>> when dealing with LOOKAHEAD on a production with the shape like >>>>>>>>>>> "SqlSelectKeywords()"[2]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please feel free to log a JIRA ticket with which we can track >>>>>>>> further >>>>>>>>>>> information of the issue. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Hongze >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> https://github.com/apache/calcite/blob/1b430721c0d9e22b2252ffcd893b42959cb7966c/core/src/main/codegen/templates/Parser.jj#L1030 >>>>>>>>>>>>>> [2] >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> https://github.com/apache/calcite/blob/1b430721c0d9e22b2252ffcd893b42959cb7966c/core/src/main/codegen/templates/Parser.jj#L288 >>>>>>>>>>>>>> >>>>>>>>>>>>>> ------ Original Message ------ >>>>>>>>>>>>>> From: "Rui Li" <[email protected]> >>>>>>>>>>>>>> To: [email protected] >>>>>>>>>>>>>> Sent: 2019/3/26 16:53:44 >>>>>>>>>>>>>> Subject: Calcite doesn't work with LOOKAHEAD(3) >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm trying to extend Calcite grammar to support some custom >>>>>>>>>>> statements. And >>>>>>>>>>>>>>> I need to increase LOOKAHEAD to 3 to resolve some ambiguity. >>>>> But >>>>>>>>> when >>>>>>>>>>> I did >>>>>>>>>>>>>>> that, the parser fails to parse queries like: >>>>>>>>>>>>>>> * select t.key from (select key from src) t* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> With exception: >>>>>>>>>>>>>>> *Caused by: >>>> org.apache.calcite.sql.parser.impl.ParseException:* >>>>>>>>>>>>>>> *Encountered "( select key" at line 1, column 19.* >>>>>>>>>>>>>>> *Was expecting one of:* >>>>>>>>>>>>>>> * <IDENTIFIER> ...* >>>>>>>>>>>>>>> * <QUOTED_IDENTIFIER> ...* >>>>>>>>>>>>>>> * <BACK_QUOTED_IDENTIFIER> ...* >>>>>>>>>>>>>>> * <BRACKET_QUOTED_IDENTIFIER> ...* >>>>>>>>>>>>>>> * <UNICODE_QUOTED_IDENTIFIER> ...* >>>>>>>>>>>>>>> * "LATERAL" ...* >>>>>>>>>>>>>>> * "(" "WITH" ...* >>>>>>>>>>>>>>> *...* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So I'm wondering whether there's some limitation on the >>>>> LOOKAHEAD >>>>>>>> we >>>>>>>>>>> can >>>>>>>>>>>>>>> use? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Best regards! >>>>>>>>>>>>>>> Rui Li >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards! >>>>>>>>>> Rui Li >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best regards! >>>>>>>> Rui Li >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >
