Or TL;DR:

If your parser is creaking, don’t blame the parser, simplify your grammar.

> On Mar 31, 2019, at 2:56 PM, Julian Hyde <[email protected]> wrote:
> 
> Welcome the world of LL parsers (e.g. JavaCC and Antlr). They are less 
> powerful than LR parsers (e.g. yacc) because they have to know that a 
> production is going to succeed before they jump into it, but they are easier 
> to use, and tend to produce human-readable error messages.
> 
> Sometimes you can use a bit of extra lookahead — i.e. read a small, fixed 
> number of tokens — to make the decision before you jump, but quite often you 
> can’t.
> 
> Sometimes semantic lookahead (i.e. try to parse and backtrack if you can’t) 
> is touted as the solution but it can have truly awful performance. (During 
> the week I bumped into https://issues.apache.org/jira/browse/HIVE-18624 
> <https://issues.apache.org/jira/browse/HIVE-18624>, where the running time is 
> literally exponential in the number of parentheses.)
> 
> The best solution is to create your own non-deterministic states, e.g. the 
> state of having just seen a parentheses but not knowing yet whether it was an 
> expression or a sub-query. It is not easy to code, but it works. You’re 
> basically emulating what an LR parser would do.
> 
> Julian
>  
> 
> 
>> On Mar 31, 2019, at 2:43 PM, Muhammad Gelbana <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> I think upgrading JavaCC maven plugin could break things if:
>> 1. The grammar changed, and I highly doubt that, or,
>> 2. We have bugs in our grammar that were fixed in the upgraded JavaCC
>> library.
>> 
>> Thanks,
>> Gelbana
>> 
>> 
>> On Sun, Mar 31, 2019 at 11:41 PM Muhammad Gelbana <[email protected] 
>> <mailto:[email protected]>>
>> wrote:
>> 
>>> Thanks for the suggestion Stamatis, but that didn't work for me. It caused
>>> compilation errors in SqlParserImpl and I couldn't see a way to resolve
>>> them.
>>> 
>>> Thanks,
>>> Gelbana
>>> 
>>> 
>>> On Sun, Mar 31, 2019 at 6:01 PM Stamatis Zampetakis <[email protected] 
>>> <mailto:[email protected]>>
>>> wrote:
>>> 
>>>> I am not sure why the update breaks so many tests neither if there is a
>>>> problem with the LOOKAHEAD but regarding CALCITE-2844, I would be inclined
>>>> to modify lines around [3] to make it work.
>>>> 
>>>> In particular, I would try to make <TABLE> with parentheses optional (just
>>>> in case you didn't try this so far).
>>>> 
>>>> Best,
>>>> Stamatis
>>>> 
>>>> [3]
>>>> 
>>>> https://github.com/apache/calcite/blob/81fa5314e94e86b6cf8df244b03f9d57c884f54d/core/src/main/codegen/templates/Parser.jj#L1884
>>>>  
>>>> <https://github.com/apache/calcite/blob/81fa5314e94e86b6cf8df244b03f9d57c884f54d/core/src/main/codegen/templates/Parser.jj#L1884>
>>>> 
>>>> Στις Κυρ, 31 Μαρ 2019 στις 5:16 μ.μ., ο/η Muhammad Gelbana <
>>>> [email protected]> έγραψε:
>>>> 
>>>>> I was trying to support selecting from table functions[1]. I tried
>>>>> extending TableRef2[2] (Production ?) to support table functions by
>>>> adding
>>>>> 
>>>>>> LOOKAHEAD(3)
>>>>>> 
>>>>> tableRef = TableFunctionCall(getPos()))
>>>>>> 
>>>>> |
>>>>>> 
>>>>> before
>>>>> 
>>>>>> LOOKAHEAD(2)
>>>>>> tableRef = CompoundIdentifier()
>>>>>> 
>>>>> 
>>>>> but it broke other tests. I tried putting my modification at the end of
>>>> the
>>>>> choices while increasing the CompoundIdentifier() lookahead to 3 to
>>>> avoid
>>>>> that choice when it faces the left bracket, but it didn't work too. I
>>>> tried
>>>>> setting aggresively high lookahead values such as 50, and it didn't work
>>>>> too. I won't be surprised if I'm doing anything wrong as I'm not
>>>> accustomed
>>>>> to working with grammar files anyway.
>>>>> 
>>>>> The only thing I'm considering now is to create a new production (I'm
>>>> not
>>>>> sure if I'm using this word correctly) such as TableRef3 and have that
>>>>> going down the common path between TableFunctionCall() and
>>>>> CompoundIdentifier() because TableFunctionCall() eventually attempts to
>>>>> cosnume a CompoundIdentifier(). This way I won't have to bother about
>>>>> tuning lookaheads I suppose.
>>>>> 
>>>>> I can create a branch of what I've accomplished so far if you wish.
>>>>> 
>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2844
>>>>> [2]
>>>>> 
>>>>> 
>>>> https://github.com/apache/calcite/blob/master/core/src/main/codegen/templates/Parser.jj#L1811
>>>>> 
>>>>> Thanks,
>>>>> Gelbana
>>>>> 
>>>>> 
>>>>> On Sun, Mar 31, 2019 at 4:15 PM Hongze Zhang <[email protected]> wrote:
>>>>> 
>>>>>> Just out of my curiosity, could you please share your case about
>>>>>> "LOOKAHEAD doest not work as expected"? Does changing to JavaCC 5.0
>>>>>> actually fixes the problem?
>>>>>> 
>>>>>> Thanks,
>>>>>> Hongze
>>>>>> 
>>>>>> 
>>>>>>> On Mar 31, 2019, at 19:17, Muhammad Gelbana <[email protected]>
>>>>> wrote:
>>>>>>> 
>>>>>>> I'm facing trouble with supporting selecting from table function for
>>>>>> Babel
>>>>>>> parser and I beleive that LOOKAHEAD isn't working as expected too.
>>>>>>> I thought it might actually be a bug so I checked out the master
>>>> branch
>>>>>> and
>>>>>>> updated the JavaCC maven plugin version to 2.6 (it's currently 2.4),
>>>>> but
>>>>>>> that failed *142* test cases and errored *9*.
>>>>>>> 
>>>>>>> The plugin v2.4 imports the JavaCC library v4
>>>>>>> The plugin v2.6 imports the JavaCC library v5
>>>>>>> 
>>>>>>> Unfortunately the release notes for the JavaCC library are broken
>>>> and
>>>>> I'm
>>>>>>> not aware of another source for the release notes for that project.
>>>>>>> Should I open a Jira to upgrade that plugin version ?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Gelbana
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 28, 2019 at 4:18 AM Rui Li <[email protected]>
>>>> wrote:
>>>>>>> 
>>>>>>>> Thanks Hongze, that's good to know.
>>>>>>>> 
>>>>>>>> On Thu, Mar 28, 2019 at 8:43 AM Hongze Zhang <[email protected]>
>>>>> wrote:
>>>>>>>> 
>>>>>>>>>> Besides, if I enable forceLaCheck, JavaCC suggests to use a
>>>>> lookahead
>>>>>>>> of
>>>>>>>>> 3
>>>>>>>>>> or more. I guess we'd better get rid of these warnings if we
>>>> want to
>>>>>>>>> stick
>>>>>>>>>> to lookahead(2).
>>>>>>>>> 
>>>>>>>>> That makes sense. Actually we had a discussion[1] on moving to
>>>>>>>>> "LOOKAHEAD=1", and seems we are close to finish it. By doing this
>>>> we
>>>>>> have
>>>>>>>>> extra benefits that we don't need to turn forceLaCheck on and
>>>> JavaCC
>>>>>>>> should
>>>>>>>>> give suggestions during maven build.
>>>>>>>>> 
>>>>>>>>> Hongze
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2847
>>>>>>>>> 
>>>>>>>>>> On Mar 27, 2019, at 10:40, Rui Li <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks Hongze for looking into the issue! Are you suggesting
>>>> this is
>>>>>>>> more
>>>>>>>>>> likely to be a JavaCC bug?
>>>>>>>>>> I filed a ticket anyway in case we want to further track it:
>>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2957
>>>>>>>>>> Besides, if I enable forceLaCheck, JavaCC suggests to use a
>>>>> lookahead
>>>>>>>> of
>>>>>>>>> 3
>>>>>>>>>> or more. I guess we'd better get rid of these warnings if we
>>>> want to
>>>>>>>>> stick
>>>>>>>>>> to lookahead(2).
>>>>>>>>>> 
>>>>>>>>>> On Wed, Mar 27, 2019 at 8:54 AM Hongze Zhang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks, Yuzhao.
>>>>>>>>>>> 
>>>>>>>>>>> Since the more generic problem is that the production "E()"[1]
>>>>> causes
>>>>>>>>> the
>>>>>>>>>>> parent production's looking ahead returns too early, I tried to
>>>>> find
>>>>>> a
>>>>>>>>> bad
>>>>>>>>>>> case of the same reason under current default setting
>>>> LOOKAHEAD=2
>>>>> but
>>>>>>>> it
>>>>>>>>>>> seems that under this number we didn't have a chance to meet the
>>>>>> issue
>>>>>>>>> yet.
>>>>>>>>>>> 
>>>>>>>>>>> So after that I suggest to not to treat this as a Calcite's
>>>> issue
>>>>>>>>>>> currently.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Hongze
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://github.com/apache/calcite/blob/11c067f9992d9c8bc29e2326dd8b299ad1e9dbdc/core/src/main/codegen/templates/Parser.jj#L335
>>>>>>>>>>> 
>>>>>>>>>>>> On Mar 26, 2019, at 20:42, Yuzhao Chen <[email protected]>
>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Maybe we should fire a jira if it is a bug.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Danny Chan
>>>>>>>>>>>> 在 2019年3月26日 +0800 PM8:33,Hongze Zhang <[email protected]>,写道:
>>>>>>>>>>>>> Ops, correct a typo:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> "... after uncommenting a line ..." -> "... after commenting a
>>>>> line
>>>>>>>>>>>>> ...".
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Hongze
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ------ Original Message ------
>>>>>>>>>>>>> From: "Hongze Zhang" <[email protected]>
>>>>>>>>>>>>> To: [email protected]
>>>>>>>>>>>>> Sent: 2019/3/26 19:28:08
>>>>>>>>>>>>> Subject: Re: Calcite doesn't work with LOOKAHEAD(3)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Firstly, thank you very much for sharing the case, Rui!
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have run a test with the SQL you provided and also run into
>>>>> the
>>>>>>>>> same
>>>>>>>>>>> exception (under a global LOOKAHEAD 3). After debugging the
>>>>> generated
>>>>>>>>>>> parser code, I think the problem is probably in the generated
>>>>>>>> LOOKAHEAD
>>>>>>>>>>> method SqlParserImpl#jj_3R_42():
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> final private boolean jj_3R_42() {
>>>>>>>>>>>>>>> if (!jj_rescan) trace_call("SqlSelect(LOOKING AHEAD...)");
>>>>>>>>>>>>>>> if (jj_scan_token(SELECT)) { if (!jj_rescan)
>>>>>>>>>>> trace_return("SqlSelect(LOOKAHEAD FAILED)"); return true; }
>>>>>>>>>>>>>>> if (jj_3R_190()) { if (!jj_rescan)
>>>>>>>> trace_return("SqlSelect(LOOKAHEAD
>>>>>>>>>>> FAILED)"); return true; }
>>>>>>>>>>>>>>> { if (!jj_rescan) trace_return("SqlSelect(LOOKAHEAD
>>>>> SUCCEEDED)");
>>>>>>>>>>> return false; }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The LOOKAHEAD method checks only a single token <SELECT>.
>>>> This
>>>>> is
>>>>>>>>>>> definitely not enough since we have already set the number to 3.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Unfortunately I didn't find a root cause so far, but after
>>>>>>>>>>> uncommenting a line[1] in production "SqlSelect()" then
>>>> everything
>>>>>>>> goes
>>>>>>>>>>> back to normal. I'm inclined to believe JavaCC has some
>>>> unexpected
>>>>>>>>> behavior
>>>>>>>>>>> when dealing with LOOKAHEAD on a production with the shape like
>>>>>>>>>>> "SqlSelectKeywords()"[2].
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Please feel free to log a JIRA ticket with which we can track
>>>>>>>> further
>>>>>>>>>>> information of the issue.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Hongze
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://github.com/apache/calcite/blob/1b430721c0d9e22b2252ffcd893b42959cb7966c/core/src/main/codegen/templates/Parser.jj#L1030
>>>>>>>>>>>>>> [2]
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> https://github.com/apache/calcite/blob/1b430721c0d9e22b2252ffcd893b42959cb7966c/core/src/main/codegen/templates/Parser.jj#L288
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ------ Original Message ------
>>>>>>>>>>>>>> From: "Rui Li" <[email protected]>
>>>>>>>>>>>>>> To: [email protected]
>>>>>>>>>>>>>> Sent: 2019/3/26 16:53:44
>>>>>>>>>>>>>> Subject: Calcite doesn't work with LOOKAHEAD(3)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I'm trying to extend Calcite grammar to support some custom
>>>>>>>>>>> statements. And
>>>>>>>>>>>>>>> I need to increase LOOKAHEAD to 3 to resolve some ambiguity.
>>>>> But
>>>>>>>>> when
>>>>>>>>>>> I did
>>>>>>>>>>>>>>> that, the parser fails to parse queries like:
>>>>>>>>>>>>>>> * select t.key from (select key from src) t*
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> With exception:
>>>>>>>>>>>>>>> *Caused by:
>>>> org.apache.calcite.sql.parser.impl.ParseException:*
>>>>>>>>>>>>>>> *Encountered "( select key" at line 1, column 19.*
>>>>>>>>>>>>>>> *Was expecting one of:*
>>>>>>>>>>>>>>> * <IDENTIFIER> ...*
>>>>>>>>>>>>>>> * <QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>>> * <BACK_QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>>> * <BRACKET_QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>>> * <UNICODE_QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>>> * "LATERAL" ...*
>>>>>>>>>>>>>>> * "(" "WITH" ...*
>>>>>>>>>>>>>>> *...*
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> So I'm wondering whether there's some limitation on the
>>>>> LOOKAHEAD
>>>>>>>> we
>>>>>>>>>>> can
>>>>>>>>>>>>>>> use?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>>> Rui Li
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards!
>>>>>>>>>> Rui Li
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best regards!
>>>>>>>> Rui Li
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 

Reply via email to