Thanks for your quick response Nick.

You are right, it works with the example but I am afraid this is not feasible 
with my complete grammar.

 If I do that for all my possible parameter values (the format depends on the 
preceding parameter name), I would have a lot of lexer rules to sort out and 
that would for sure be conflicting:

STATION_NAME           :           LETTER LETTER LETTER DIGIT;
ADDRESS                    :           (LETTER|DIGIT) (LETTER|DIGIT) 
(LETTER|DIGIT) (LETTER|DIGIT) (LETTER|DIGIT) (LETTER|DIGIT) (LETTER|DIGIT);
LOCATION                    :           LETTER LETTER LETTER LETTER;
SESSION                      :           LETTER LETTER DIGIT DIGIT DIGIT DIGIT;
PROVIDER                   :           LETTER LETTER LETTER;
CODE                           :           LETTER (LETTER|DIGIT) (LETTER|DIGIT);
DATE                           :           DIGIT DIGIT DIGIT DIGIT DIGIT DIGIT;
TIME                             :           (DIGIT DIGIT DIGIT DIGIT) | (DASH 
DASH DASH DASH) | (SPACE SPACE SPACE SPACE);
...


I think it is much preferable to have the lexer returning a sequence of DIGIT 
and LETTERS (except for param names), and to specify what is the expected 
sequence for a given parameter at parsing level. Something like that (but again 
this is an extract):

grammar test;

listOfParameters           :           parameterDef (CRLF parameterDef)* EOF;

parameterDef                : 
stationParameter|addressParameter|locationParameter|sessionParameter|providerParameter|codeParameter|dateParameter|timeParameter;

stationParameter           :           STATION SPACE stationName;
stationName                  :           LETTER LETTER LETTER DIGIT;

addressParameter         :           ADDRESS SPACE stationName;
adress                          :           (LETTER|DIGIT) (LETTER|DIGIT) 
(LETTER|DIGIT) (LETTER|DIGIT) (LETTER|DIGIT) (LETTER|DIGIT) (LETTER|DIGIT);

locationParameter         :           LOCATION SPACE stationName;
location                        :           LETTER LETTER LETTER LETTER;

sessionParameter          :           SESSION SPACE stationName;
session                         :           LETTER LETTER DIGIT DIGIT DIGIT 
DIGIT;

providerParameter         :           PROVIDER SPACE stationName;
provider                        :           LETTER LETTER LETTER;

codeParameter              :           CODE SPACE stationName;
code                             :           LETTER (LETTER|DIGIT) 
(LETTER|DIGIT);

dateParameter               :           DATE SPACE stationName;
date                              :           DIGIT DIGIT DIGIT DIGIT DIGIT 
DIGIT;

timeParameter               :           TIME SPACE stationName;
time                              :           (DIGIT DIGIT DIGIT DIGIT) | (DASH 
DASH DASH DASH) | (SPACE SPACE SPACE SPACE);


STATION           :           'STATION';
ADDRESS        :           'ADDRESS';
LOCATION:       'LOCATION';
SESSION          :           'SESSION';
PROVIDER:      'PROVIDER';
CODE   :           'CODE';
DATE   :           'DATE';
TIME     :           'TIME';
LETTER            :           'a'..'z' | 'A'..'Z';
DIGIT    :           '0'..'9';
DASH   :           '-';
SPACE :           ' ';
CRLF : '\r'? '\n';






From: Nick Vlassopoulos [mailto:[email protected]]
Sent: 01 December 2010 15:10
To: COUJOULOU, Philippe
Cc: [email protected]
Subject: Re: [antlr-interest] Antlr lexer does not try other possible matches 
when it fails to match a token

Hello Philippe,

Although I am not an expert, I thing you should let the lexer sort out
the "3 letters 1 digit" in the station name. Alternatively, you could probably
add the station name as an identifier and check if it is in the correct format
after parsing it.

Without being sure if it is a good solution, the following seems to work:

Best regards,

Nikos

-------------------------
grammar Stations;

stationParameter         :
            KEYWORD_STATION SPACE stationName;

stationName
            :           STATION_NAME;

STATION_NAME
            :           LETTER LETTER LETTER DIGIT;

KEYWORD_STATION        :           'STATION';
LETTER                     :           'a'..'z' | 'A'..'Z';
DIGIT             :           '0'..'9';
SPACE                       :           ' ';
-------------------------


On Wed, Dec 1, 2010 at 2:18 PM, COUJOULOU, Philippe 
<[email protected]<mailto:[email protected]>> wrote:
Dear all,

I am trying to parse a message that contains parameters values like 
<PARAM_NAME> <VALUE>, for instance "STATION EST1".
Here is a very simple extract of my grammar for one of these parameters (the 
one given in the above example):

grammar test;

KEYWORD_STATION :       'STATION';
DIGIT    :        '0'..'9';
LETTER  :        'a'..'z' | 'A'..'Z';
SPACE   :       ' ';

stationParameter        :       KEYWORD_STATION SPACE stationName;
stationName     :       LETTER LETTER LETTER DIGIT;


The point is that when I try to parse my example message (STATION EST1), I get 
a MismatchTokenException at the point where the parser attempts to read the 
last "ST1". After some analysis, I understood that the lexer generated the 
following tokens: KEYWORD_STATION SPACE LETTER for the string "STATION E"  and 
then attempted to match the remaining "ST1" with KEYWORD_STATION but failed to 
complete it.

At this point, I would expect the lexer to backtrack to the beginning of 'ST1' 
and then match it with LETTER LETTER DIGIT, but it doesn't.

I have tried various combinations of "backtrack", "memorize" and "k" options 
without any success. I must have missed something. (Should it help, I use 
ANTLRWorks 1.4).

Please could you tell me how to proceed in order to make the lexer backtrack 
and try other alternatives when a keyword of my language is not exactly matched 
?

Thanks in advance for your help.

Best Regards,

Philippe Coujoulou.


The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


This mail has originated outside your organization, either from an external 
partner or the Global Internet.

Keep this in mind if you answer this message.



The information in this e-mail is confidential. The contents may not be 
disclosed or used by anyone other than the addressee. Access to this e-mail by 
anyone else is unauthorised.
If you are not the intended recipient, please notify Airbus immediately and 
delete this e-mail.
Airbus cannot accept any responsibility for the accuracy or completeness of 
this e-mail as it has been sent over public networks. If you have any concerns 
over the content of this message or its Accuracy or Integrity, please contact 
Airbus immediately.
All outgoing e-mails from Airbus are checked using regularly updated virus 
scanning software but you should take whatever measures you deem to be 
appropriate to ensure that this message and any attachments are virus free.


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

Reply via email to