I will take a look at this.  Thanks for sending it.  I appreciate all of your 
help.

Interestingly (or unfortunately, depending on your perspective), my 
understanding is that the grammar I'm trying to parse doesn't have precedence.  
It does support parentheses through which you can control precedence like you 
do in most languages.  

I've read one discussion which said (absent parentheses) the last operator 
wins.  So (a AND b OR c) would be interpreted as (a AND (b OR c)) but (a OR b 
AND c) would be interpreted as (a OR (b AND c)).  In fact, I just verified on 
our system that that's exactly how it works.  (a AND b OR c AND d) would be 
interpreted as (a AND (b OR (c AND d))).  Since this is an open source project 
(SOLR - lucene.apache.org), I don't have the option of changing how it works 
though it does seem a bit odd.

My experience is that people use a lot of parentheses with this system.  It's 
also very common to see long strings of items where the operator is the same 
(e.g., (a OR b OR c OR d... OR w)) though each of the terms (e.g., "a") could 
be several things ANDed together, but with parentheses around them to ensure 
they are treated as a single term.  This is why I want the AST to be (OR a b c 
d ... w).  For what I need to do (common sub-expression factoring/elimination), 
I think it's easier to deal with.

Thanks again

Scott

-----Original Message-----
From: John B. Brodie [mailto:[email protected]] 
Sent: Monday, August 08, 2011 4:44 PM
To: Scott Smith
Cc: [email protected]
Subject: Re: [antlr-interest] Can't generate the AST I want

On Mon, 2011-08-08 at 19:41 +0000, Scott Smith wrote:
> I am writing a grammar to generate ASTs.  There are operators called "AND" 
> and "OR" (defined in the lexer) in the language.  I want my parser to produce 
> ASTs that group together all of the terms which are using the same operator.
> 
> Also, there can be cases where there is no operator given.  In that case, 
> depending on the program configuration, it may be "AND" or "OR".  I've called 
> it DFLT_OP for the time being.  Here are some examples:
> 
> (a OR b OR c) -> (OR a b c)
> a b -> (DFLT_OP a b)                                                       // 
> no operator
> a AND (b OR c) AND d ->(AND a (OR b c) d) a AND b OR c AND d ->(AND 
> (OR (AND a b) c) d)  // I think this one is right; I don't need to 
> reorder terms because AND was used twice

i do not think this is correct.
you have (a AND b OR c AND d) as (((a AND b) OR c) AND d) but i think most 
people would expect it to be ((a AND b) OR (c AND d))

e.g. AND has higher precedence than OR (i think)

> 
> The grammar I've generated (somewhat simplified) is below.  I think 
> everything is pretty vanilla except for the definition of "expression":
> 
> grammar testGr;
> 
> options {
>   language = Java;
>   output = AST;
>   ASTLabelType = CommonTree;
> }
> 
> tokens {
>   DFLT_OP;
> }
> 
> @header {
>   package a.b.c;
> }
> 
> @lexer::header {
>   package a.b.c;
> }
> 
> filter     :    expression EOF ;
> 
> term :    IDENTIFIER
>      |    '('! expression ')'!
>      ;
> 
> // parsing "(a AND b AND c)" (w/o quotes)
> 
> // ATTEMPT 1
> // This gives (AND (AND a b) c).  This is correct, but what I // 
> really want is (AND a b c) //expression
> //   :    term ((AND^ | OR^)? term)*
> //   ;
> 
> // ATTEMPT2
> // This doesn't compile due to "recursive rule invocations".  I'm // 
> also not thrilled the term order changes //expression
> //   :    (id1=term OR)+ id2=term -> ^(OR id2 id1+)
> //   |    (id1=term AND)+ id2=term -> ^(AND id2 id1+)
> //   |    id1=term id2=term+  -> ^(DFLT_OP id1 id2+)
> //   |    term -> term
> //   ;
> 
> // ATTEMPT3
> // This compiles but appears to produce (AND a c).  No "b".

you want to use the `+=` meta operator (not `=`) when collecting a list of 
elements on the left of the `->` meta operator. e.g. id2+=term not id2=term. 
using the = on a list just gets the last element of the list as you have 
observed.

> // expression
> //options {
> //   backtrack=true;
> //}
> //   :    id1=term (OR id2=term)+ -> ^(OR $id1 $id2+)
> //   |    id1=term (AND id2=term)+ -> ^(AND $id1 $id2+)
> //   |    id1=term id2=term+  -> ^(DFLT_OP $id1 $id2+)
> //   |    term -> term
> //   ;
> //
> 
> // This doesn't compile due to "recursive rule invocations".
> expression
>      :    id1=term OR id2=term (OR id3=term)+ -> ^(OR $id1 $id2 $id3+)
>      |    id1=term AND id2=term (AND id3=term)+ -> ^(AND $id1 $id2 $id3+)
>      |    id1=term id2=term+  -> ^(DFLT_OP $id1 $id2+)
>      |    term -> term
>      ;
> 
> AND  :    'AND' | '&&' ;
> OR   :    'OR' | '||' ;
> IDENTIFIER :    LETTER (LETTER | NUMBER)* ;
> fragment LETTER :    LOWER | UPPER ;
> fragment LOWER  :    'a'..'z' ;
> fragment UPPER  :    'A'..'Z' ;
> fragment NUMBER :    '0'..'9' ;
> WS  : (' ' | '\t' | '\n' | '\r') {$channel=HIDDEN; } ;
> 
> Is there a way to tell ANTLR what I want to do?  How should I write the 
> expression?
> 
> Any thoughts about the minor problem of substituting AND or OR instead of the 
> DFLT_OP.  I figured out how to pass a Boolean to my parser which tells it 
> which one of these is the correct one.  Can I write something like:
> 
>                 ({dftlOpAND ? AND : OR} $id1 $id2)

I am not sure that you can do this in constructing a root of a sub-tree.

attached please find a complete grammar that solves all 3 of the problems 
identified:
        a) precedence of AND over OR
        b) making AND and OR n-ary trees rather than binary
        c) inserting an imaginary AND or OR as root when no connective
           operator is present.

Hope this helps...
   -jbb


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to