Welcome the world of LL parsers (e.g. JavaCC and Antlr). They are less powerful 
than LR parsers (e.g. yacc) because they have to know that a production is 
going to succeed before they jump into it, but they are easier to use, and tend 
to produce human-readable error messages.

Sometimes you can use a bit of extra lookahead — i.e. read a small, fixed 
number of tokens — to make the decision before you jump, but quite often you 
can’t.

Sometimes semantic lookahead (i.e. try to parse and backtrack if you can’t) is 
touted as the solution but it can have truly awful performance. (During the 
week I bumped into https://issues.apache.org/jira/browse/HIVE-18624 
<https://issues.apache.org/jira/browse/HIVE-18624>, where the running time is 
literally exponential in the number of parentheses.)

The best solution is to create your own non-deterministic states, e.g. the 
state of having just seen a parentheses but not knowing yet whether it was an 
expression or a sub-query. It is not easy to code, but it works. You’re 
basically emulating what an LR parser would do.

Julian
 


> On Mar 31, 2019, at 2:43 PM, Muhammad Gelbana <m.gelb...@gmail.com> wrote:
> 
> I think upgrading JavaCC maven plugin could break things if:
> 1. The grammar changed, and I highly doubt that, or,
> 2. We have bugs in our grammar that were fixed in the upgraded JavaCC
> library.
> 
> Thanks,
> Gelbana
> 
> 
> On Sun, Mar 31, 2019 at 11:41 PM Muhammad Gelbana <m.gelb...@gmail.com>
> wrote:
> 
>> Thanks for the suggestion Stamatis, but that didn't work for me. It caused
>> compilation errors in SqlParserImpl and I couldn't see a way to resolve
>> them.
>> 
>> Thanks,
>> Gelbana
>> 
>> 
>> On Sun, Mar 31, 2019 at 6:01 PM Stamatis Zampetakis <zabe...@gmail.com>
>> wrote:
>> 
>>> I am not sure why the update breaks so many tests neither if there is a
>>> problem with the LOOKAHEAD but regarding CALCITE-2844, I would be inclined
>>> to modify lines around [3] to make it work.
>>> 
>>> In particular, I would try to make <TABLE> with parentheses optional (just
>>> in case you didn't try this so far).
>>> 
>>> Best,
>>> Stamatis
>>> 
>>> [3]
>>> 
>>> https://github.com/apache/calcite/blob/81fa5314e94e86b6cf8df244b03f9d57c884f54d/core/src/main/codegen/templates/Parser.jj#L1884
>>> 
>>> Στις Κυρ, 31 Μαρ 2019 στις 5:16 μ.μ., ο/η Muhammad Gelbana <
>>> m.gelb...@gmail.com> έγραψε:
>>> 
>>>> I was trying to support selecting from table functions[1]. I tried
>>>> extending TableRef2[2] (Production ?) to support table functions by
>>> adding
>>>> 
>>>>> LOOKAHEAD(3)
>>>>> 
>>>> tableRef = TableFunctionCall(getPos()))
>>>>> 
>>>> |
>>>>> 
>>>> before
>>>> 
>>>>> LOOKAHEAD(2)
>>>>> tableRef = CompoundIdentifier()
>>>>> 
>>>> 
>>>> but it broke other tests. I tried putting my modification at the end of
>>> the
>>>> choices while increasing the CompoundIdentifier() lookahead to 3 to
>>> avoid
>>>> that choice when it faces the left bracket, but it didn't work too. I
>>> tried
>>>> setting aggresively high lookahead values such as 50, and it didn't work
>>>> too. I won't be surprised if I'm doing anything wrong as I'm not
>>> accustomed
>>>> to working with grammar files anyway.
>>>> 
>>>> The only thing I'm considering now is to create a new production (I'm
>>> not
>>>> sure if I'm using this word correctly) such as TableRef3 and have that
>>>> going down the common path between TableFunctionCall() and
>>>> CompoundIdentifier() because TableFunctionCall() eventually attempts to
>>>> cosnume a CompoundIdentifier(). This way I won't have to bother about
>>>> tuning lookaheads I suppose.
>>>> 
>>>> I can create a branch of what I've accomplished so far if you wish.
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2844
>>>> [2]
>>>> 
>>>> 
>>> https://github.com/apache/calcite/blob/master/core/src/main/codegen/templates/Parser.jj#L1811
>>>> 
>>>> Thanks,
>>>> Gelbana
>>>> 
>>>> 
>>>> On Sun, Mar 31, 2019 at 4:15 PM Hongze Zhang <notify...@126.com> wrote:
>>>> 
>>>>> Just out of my curiosity, could you please share your case about
>>>>> "LOOKAHEAD doest not work as expected"? Does changing to JavaCC 5.0
>>>>> actually fixes the problem?
>>>>> 
>>>>> Thanks,
>>>>> Hongze
>>>>> 
>>>>> 
>>>>>> On Mar 31, 2019, at 19:17, Muhammad Gelbana <m.gelb...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>> I'm facing trouble with supporting selecting from table function for
>>>>> Babel
>>>>>> parser and I beleive that LOOKAHEAD isn't working as expected too.
>>>>>> I thought it might actually be a bug so I checked out the master
>>> branch
>>>>> and
>>>>>> updated the JavaCC maven plugin version to 2.6 (it's currently 2.4),
>>>> but
>>>>>> that failed *142* test cases and errored *9*.
>>>>>> 
>>>>>> The plugin v2.4 imports the JavaCC library v4
>>>>>> The plugin v2.6 imports the JavaCC library v5
>>>>>> 
>>>>>> Unfortunately the release notes for the JavaCC library are broken
>>> and
>>>> I'm
>>>>>> not aware of another source for the release notes for that project.
>>>>>> Should I open a Jira to upgrade that plugin version ?
>>>>>> 
>>>>>> Thanks,
>>>>>> Gelbana
>>>>>> 
>>>>>> 
>>>>>> On Thu, Mar 28, 2019 at 4:18 AM Rui Li <lirui.fu...@gmail.com>
>>> wrote:
>>>>>> 
>>>>>>> Thanks Hongze, that's good to know.
>>>>>>> 
>>>>>>> On Thu, Mar 28, 2019 at 8:43 AM Hongze Zhang <notify...@126.com>
>>>> wrote:
>>>>>>> 
>>>>>>>>> Besides, if I enable forceLaCheck, JavaCC suggests to use a
>>>> lookahead
>>>>>>> of
>>>>>>>> 3
>>>>>>>>> or more. I guess we'd better get rid of these warnings if we
>>> want to
>>>>>>>> stick
>>>>>>>>> to lookahead(2).
>>>>>>>> 
>>>>>>>> That makes sense. Actually we had a discussion[1] on moving to
>>>>>>>> "LOOKAHEAD=1", and seems we are close to finish it. By doing this
>>> we
>>>>> have
>>>>>>>> extra benefits that we don't need to turn forceLaCheck on and
>>> JavaCC
>>>>>>> should
>>>>>>>> give suggestions during maven build.
>>>>>>>> 
>>>>>>>> Hongze
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2847
>>>>>>>> 
>>>>>>>>> On Mar 27, 2019, at 10:40, Rui Li <lirui.fu...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> Thanks Hongze for looking into the issue! Are you suggesting
>>> this is
>>>>>>> more
>>>>>>>>> likely to be a JavaCC bug?
>>>>>>>>> I filed a ticket anyway in case we want to further track it:
>>>>>>>>> https://issues.apache.org/jira/browse/CALCITE-2957
>>>>>>>>> Besides, if I enable forceLaCheck, JavaCC suggests to use a
>>>> lookahead
>>>>>>> of
>>>>>>>> 3
>>>>>>>>> or more. I guess we'd better get rid of these warnings if we
>>> want to
>>>>>>>> stick
>>>>>>>>> to lookahead(2).
>>>>>>>>> 
>>>>>>>>> On Wed, Mar 27, 2019 at 8:54 AM Hongze Zhang <notify...@126.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Thanks, Yuzhao.
>>>>>>>>>> 
>>>>>>>>>> Since the more generic problem is that the production "E()"[1]
>>>> causes
>>>>>>>> the
>>>>>>>>>> parent production's looking ahead returns too early, I tried to
>>>> find
>>>>> a
>>>>>>>> bad
>>>>>>>>>> case of the same reason under current default setting
>>> LOOKAHEAD=2
>>>> but
>>>>>>> it
>>>>>>>>>> seems that under this number we didn't have a chance to meet the
>>>>> issue
>>>>>>>> yet.
>>>>>>>>>> 
>>>>>>>>>> So after that I suggest to not to treat this as a Calcite's
>>> issue
>>>>>>>>>> currently.
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Hongze
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> https://github.com/apache/calcite/blob/11c067f9992d9c8bc29e2326dd8b299ad1e9dbdc/core/src/main/codegen/templates/Parser.jj#L335
>>>>>>>>>> 
>>>>>>>>>>> On Mar 26, 2019, at 20:42, Yuzhao Chen <yuzhao....@gmail.com>
>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Maybe we should fire a jira if it is a bug.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Danny Chan
>>>>>>>>>>> 在 2019年3月26日 +0800 PM8:33,Hongze Zhang <notify...@126.com>,写道:
>>>>>>>>>>>> Ops, correct a typo:
>>>>>>>>>>>> 
>>>>>>>>>>>> "... after uncommenting a line ..." -> "... after commenting a
>>>> line
>>>>>>>>>>>> ...".
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Hongze
>>>>>>>>>>>> 
>>>>>>>>>>>> ------ Original Message ------
>>>>>>>>>>>> From: "Hongze Zhang" <notify...@126.com>
>>>>>>>>>>>> To: dev@calcite.apache.org
>>>>>>>>>>>> Sent: 2019/3/26 19:28:08
>>>>>>>>>>>> Subject: Re: Calcite doesn't work with LOOKAHEAD(3)
>>>>>>>>>>>> 
>>>>>>>>>>>>> Firstly, thank you very much for sharing the case, Rui!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have run a test with the SQL you provided and also run into
>>>> the
>>>>>>>> same
>>>>>>>>>> exception (under a global LOOKAHEAD 3). After debugging the
>>>> generated
>>>>>>>>>> parser code, I think the problem is probably in the generated
>>>>>>> LOOKAHEAD
>>>>>>>>>> method SqlParserImpl#jj_3R_42():
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> final private boolean jj_3R_42() {
>>>>>>>>>>>>>> if (!jj_rescan) trace_call("SqlSelect(LOOKING AHEAD...)");
>>>>>>>>>>>>>> if (jj_scan_token(SELECT)) { if (!jj_rescan)
>>>>>>>>>> trace_return("SqlSelect(LOOKAHEAD FAILED)"); return true; }
>>>>>>>>>>>>>> if (jj_3R_190()) { if (!jj_rescan)
>>>>>>> trace_return("SqlSelect(LOOKAHEAD
>>>>>>>>>> FAILED)"); return true; }
>>>>>>>>>>>>>> { if (!jj_rescan) trace_return("SqlSelect(LOOKAHEAD
>>>> SUCCEEDED)");
>>>>>>>>>> return false; }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The LOOKAHEAD method checks only a single token <SELECT>.
>>> This
>>>> is
>>>>>>>>>> definitely not enough since we have already set the number to 3.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Unfortunately I didn't find a root cause so far, but after
>>>>>>>>>> uncommenting a line[1] in production "SqlSelect()" then
>>> everything
>>>>>>> goes
>>>>>>>>>> back to normal. I'm inclined to believe JavaCC has some
>>> unexpected
>>>>>>>> behavior
>>>>>>>>>> when dealing with LOOKAHEAD on a production with the shape like
>>>>>>>>>> "SqlSelectKeywords()"[2].
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please feel free to log a JIRA ticket with which we can track
>>>>>>> further
>>>>>>>>>> information of the issue.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Hongze
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> https://github.com/apache/calcite/blob/1b430721c0d9e22b2252ffcd893b42959cb7966c/core/src/main/codegen/templates/Parser.jj#L1030
>>>>>>>>>>>>> [2]
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>>> https://github.com/apache/calcite/blob/1b430721c0d9e22b2252ffcd893b42959cb7966c/core/src/main/codegen/templates/Parser.jj#L288
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ------ Original Message ------
>>>>>>>>>>>>> From: "Rui Li" <lirui.fu...@gmail.com>
>>>>>>>>>>>>> To: dev@calcite.apache.org
>>>>>>>>>>>>> Sent: 2019/3/26 16:53:44
>>>>>>>>>>>>> Subject: Calcite doesn't work with LOOKAHEAD(3)
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I'm trying to extend Calcite grammar to support some custom
>>>>>>>>>> statements. And
>>>>>>>>>>>>>> I need to increase LOOKAHEAD to 3 to resolve some ambiguity.
>>>> But
>>>>>>>> when
>>>>>>>>>> I did
>>>>>>>>>>>>>> that, the parser fails to parse queries like:
>>>>>>>>>>>>>> * select t.key from (select key from src) t*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> With exception:
>>>>>>>>>>>>>> *Caused by:
>>> org.apache.calcite.sql.parser.impl.ParseException:*
>>>>>>>>>>>>>> *Encountered "( select key" at line 1, column 19.*
>>>>>>>>>>>>>> *Was expecting one of:*
>>>>>>>>>>>>>> * <IDENTIFIER> ...*
>>>>>>>>>>>>>> * <QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>> * <BACK_QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>> * <BRACKET_QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>> * <UNICODE_QUOTED_IDENTIFIER> ...*
>>>>>>>>>>>>>> * "LATERAL" ...*
>>>>>>>>>>>>>> * "(" "WITH" ...*
>>>>>>>>>>>>>> *...*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> So I'm wondering whether there's some limitation on the
>>>> LOOKAHEAD
>>>>>>> we
>>>>>>>>>> can
>>>>>>>>>>>>>> use?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>>> Rui Li
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best regards!
>>>>>>>>> Rui Li
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards!
>>>>>>> Rui Li
>>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Reply via email to