Re: [antlr-dev] Operator precedence parsing

Sam Harwell Wed, 29 Apr 2009 07:47:35 -0700

I currently haven't implemented the precedence parser as part of the
ANTLR codegen. I only built the backend to get an idea of what the
output might look like. It takes a flat input of arbitrary root elements
and tokens that are [part of] some operator (ternary operators can have
2 tokens). The precedence tree is composed of groups of operators, where
each group is either left-to-right or right-to-left and contains 1 or
more member operators. Operators fall into one of the following
categories:


Implemented:

UnaryOperator, either prefix or postfix
BinaryOperator
TernaryOperator

Not yet implemented, but considering:

UnaryGroupOperator, able to handle what is normally the (expr) root
element
BinaryGroupOperator, either prefix or postfix, handles function calls
and prefix items like Chapel's domain expressions
SequencingOperator, such as the C++ comma operator

Operators with 1 token use that token as the AST root, optionally
setting its token type to some other type (eg. so the walker can
distinguish prefix/postfix ++/--). Operators with 2 tokens use the first
as the root, optionally setting the type, and optionally including the
second operator token as its last child.

Sam

-----Original Message-----
From: Peter Burns [mailto:[email protected]] 
Sent: Thursday, April 23, 2009 6:35 PM
To: Terence Parr
Cc: Sam Harwell; ANTLR-dev Dev
Subject: Re: [antlr-dev] FW: Unary operator precedence

We can handle unary operators at any precedence level.

Here's an example:

e options {strategy=precedence;}
   : e ('$'|'$!') e
   | e ('>>'|'>>=') e
   | e '||' <associativity=right> e
   | e '&&' <associativity=right> e
   | e ('=='|'/='|'<'|'<='|'>='|'>') e
   | e ':' <associativity=right> e
   | e ('+'|'-') e
   | e ('*'|'/') e
   | '-' e
   | e ('^' <associativity=right>|'^^' <associativity=right>|'**'  
<associativity=right>) e
   | e '.' <associativity=right> e
   | NUMBER
   | BOOL
   | ID
   ;

This passes the following gunit tests.  The tenth one ensures that we  
get unary precedence right:

expression:
"1234" -> "1234"
"-1234" -> (- 1234)
"1+1" -> (+ 1 1)
"2*2" -> (* 2 2)
"1==1" -> (== 1 1)
"1*1+2" -> (+ (* 1 1) 2)
"1+1*2" -> (+ 1 (* 1 2))
"1^2^3" -> (^ 1 (^ 2 3))
"2^3^^4**5" -> (^ 2 (^^ 3 (** 4 5)))
"1*-2+3" -> (+ (* 1 (- 2)) 3)
"1:2>>=f>>g" -> (>> (>>= (: 1 2) f) g)
"1+1==2&&True||False" -> (|| (&& (== (+ 1 1) 2) True) False)
"f.g$h" -> ($ (. f g) h)

There's a single string template file that would need to be rewritten  
for each target language, but it's very straightforward.

-Peter

On Apr 23, 2009, at 2:32 PM, Terence Parr wrote:

> Hi Sam,
>
> I'm working on an ANTLR preproc with Peter Burns here at USF.
>
> Peter: we don't require unary higher than binary do we?
>
> Ter
> On Apr 23, 2009, at 10:59 AM, Sam Harwell wrote:
>
>> I'm working on a generalized expression parser backend that I hope  
>> to integrate as generated code in ANTLR targets. The forwarded  
>> email below shows a particular issue that needs to be addressed  
>> before the expression parser can be used with languages with "odd"  
>> precedence orders. I'm looking for feedback on my proposed  
>> resolution in my parser backend.
>>
>> There shouldn't be a restriction in the code that says all unary  
>> operators have to have higher precedence than any binary operators.  
>> As such, I'm planning on handling this by adding the following  
>> enumeration, and either allowing it (with a default value of  
>> Elevate) or explicitly requiring it for unary operators that come  
>> before some binary operator in the precedence order.
>>
>> This issue fully addresses ternary operators as well, where the  
>> required elevation involves the use of unary and/or binary  
>> operators in the middle section, such as the {value if true}  
>> section of the C++ ternary operator: {condition} ? {value if  
>> true} : {value if false}.
>>
>> /// <summary>
>> /// Controls the behavior of lower priority, lower cardinality  
>> operator next to higher priority,
>> /// higher cardinality operator.
>> /// </summary>
>> /// <example>
>> /// In languages like Matlab, the unary +/- operators have a lower  
>> priority than the binary
>> /// exponentiation operator. The user needs to be able to configure  
>> the handling of the unary
>> /// +/- operators in the following case:
>> ///
>> ///   2^-2
>> ///
>> /// If the unary operator's precedence remains lower than the  
>> exponentiation operator, this
>> /// is syntactically incorrect. As an option, and one that Matlab  
>> uses, the operator's
>> /// precedence immediately following an exponentiation operator is  
>> elevated, so the expression
>> /// above is allowed.
>> /// </example>
>> public enum MixedCardinalityAction
>> {
>>    /// <summary>
>>    /// Never re-order the operator precedence of the lower priority  
>> operator to allow its
>>    /// use next to a higher priority operator.
>>    /// </summary>
>>    None,
>>    /// <summary>
>>    /// When syntactically unambiguous, elevates the lower  
>> cardinality operator's precedence
>>    /// to allow its usage next to a higher cardinality operator.
>>    /// </summary>
>>    Elevate
>> }
>>
>> Thank you,
>> Sam Harwell
>>
>> From: Sam Harwell
>> Sent: Thursday, April 23, 2009 10:05 AM
>> To: '[email protected]'
>> Subject: Unary operator precedence
>>
>> Hello,
>>
>> I am having some trouble with the unary operator precedence in  
>> certain situations. According to the spec, the multiplication  
>> operator has higher precedence than the unary - operator, which is  
>> posing a problem for the following expression:
>>
>> 3 * -2
>>
>> For the fixed precedence levels as documented, this appears to be a  
>> syntax error unless written like this:
>>
>> 3 * (-2)
>>
>> One way to allow this syntax is cause the unary operator  
>> immediately following a *, /, %, reduce, scan, !, ~, or ** to have  
>> higher than normal precedence. Operators ! and ~ do not need this  
>> clarification following *, /, or %. The following is an extract of  
>> a grammar for a recursive descent LL parser.
>>
>> not_expression
>>        :       ('!' | '~')*
>>                reduce_scan_expression
>>        ;
>>
>> multiply_expression
>>        :       not_expression
>>                (       ('*' | '/' | '%') =>
>>                        ('*' | '/' | '%')
>>                        not_expression
>>                )*
>>        ;
>>
>> negate_expression
>>        :       ('+' | '-')*
>>                multiply_expression
>>        ;
>>
>> addition_expression
>>        :       negate_expression
>>                (       ('+' | '-') =>
>>                        ('+' | '-')
>>                        negate_expression
>>                )*
>>        ;
>>
>> I would have to make the following alteration which, despite being  
>> unambiguous, clearly shows that a unary operator following a  
>> multiplicative operator has a different precedence from the same  
>> operator in other contexts.
>>
>> multiply_expression
>>        :       not_expression
>>                (       ('*' | '/' | '%') =>
>>                        ('*' | '/' | '%')
>>                        ('+' | '-')? not_expression
>>                )*
>>        ;
>>
>> negate_expression
>>        :       ('+' | '-')*
>>                multiply_expression
>>        ;
>>
>> Octave (and hence Matlab I'm guessing) follows this approach  
>> (-2^2=-4 and 2*-2=-4). C++ handles this problem by making .* and - 
>> >* their own operators, then making all unary operators have a  
>> higher precedence than any binary operators. Either way it goes, I  
>> believe the order should be documented so it does not pose a  
>> problem for people creating a Chapel compiler in the future. :)
>>
>> Thank you,
>> Sam Harwell
>>
>> _______________________________________________
>> antlr-dev mailing list
>> [email protected]
>> http://www.antlr.org/mailman/listinfo/antlr-dev
>

_______________________________________________
antlr-dev mailing list
[email protected]
http://www.antlr.org/mailman/listinfo/antlr-dev

Re: [antlr-dev] Operator precedence parsing

Reply via email to