[il-antlr-interest: 28282] Re: [antlr-interest] Grammar help

Brian Catlin Tue, 16 Mar 2010 02:12:42 -0700

In my excitement of not seeing any error messages, I neglected to really
test the parser :-(


I don't get the errors I was getting before, but that is because the
FILE_NAME token is matching everything,  I put a simple printf action on the
FILE_NAME token, and it gets called for all input:

DT> @abc.def
Found file name: @abc.def
DT> illegal command
Found file name: illegal command
DT> 'alj;klajjf
Found file name: 'alj;klajjf

Is there a way to make the FILE_NAME token context sensitive so that the
lexer doesn't try to match it unless we're in a rule that wants to find a
file name?  I tried making the FILE_NAME token a fragment, but then the
parser failed to recognize anything as valid.

Here's the grammar:

//
// This grammar defines the commands available to the DiskTool (DT) program
//

grammar Commands;

options 
        {
        language = C;
        backtrack = true;
        memoize = true;
        }

@lexer::header
{
#define ANTLR3_INLINE_INPUT_ASCII
}

//+
// Productions
//-

commands
        :
        (script_command
        | dump_command
        | show_command
        )*;

script_command
        :  '@' 
        FILE_NAME
        ;

dump_command
        : DUMP
        (dump_struct
        | dump_block
        | a_file
        );

show_command
        : SHOW
        (structure_nouns
        | storage_nouns
        | a_file
        );
        
mbr_vbr
        : MBR 
        | VBR
        ;

block_nouns
        : LBN 
        | LCN 
        | VBN 
        | VCN
        ;

structure_nouns
        : MBR
        | VBR
        ;

dump_block

        : block_nouns
        number
        (
        (',' number
        )
        | 
        (':' number
        ))?
        DRIVE_NAME?
        ;

dump_struct
        : mbr_vbr
        ('/' qualifier)?
        DRIVE_NAME?
        ;

storage_nouns
        : DISK
        | VOLUME
        ;
        
a_file
        : FILE
        FILE_NAME
        ;

number
        : DEC_NUMBER 
        | HEX_NUMBER
        ;

qualifier
        : ALL
        | CODE
        | TABLE
        ;

//+
// Tokens
//-

// Verbs

DUMP    : 'DUMP';
SHOW    : 'SHOW';

// Nouns

DISK    : 'DISK';
FILE    : 'FILE';
LBN     : 'LBN';
LCN     : 'LCN';
MBR     : 'MBR';
PBN     : 'PBN';
VBN     : 'VBN';
VBR     : 'VBR';
VCN     : 'VCN';
VOLUME  : 'VOLUME';

// Qualifiers

ALL     : 'ALL';
CODE    : 'CODE';
TABLE   : 'TABLE';

// Miscellaneous tokens

DRIVE_NAME
        : LETTER ':'
        ;
        
fragment
LETTER  : 'A'..'Z';

fragment
DIGIT   : '0'..'9';

fragment
HEX_DIGIT       : (DIGIT | 'A'..'F');

HEX_NUMBER      : '0X' HEX_DIGIT+;

DEC_NUMBER      : DIGIT+;

FILE_NAME
        :  ~('|' | '<' | '>' | '*' | '?' | '\r' | '\n')+ (('\r'? '\n') |
EOF)
        {printf("Found file name: \%s\n", GETTEXT()->chars);};

LINE_COMMENT
        : '!' ~('\n'|'\r')* (('\r'? '\n') | EOF) {$channel=HIDDEN;}
        {printf("Found comment: \%s\n", GETTEXT()->chars);};

WS      : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};


-----Original Message-----
From: Brian Catlin [mailto:[email protected]] 
Sent: Tuesday, March 16, 2010 16:18
To: '[email protected]'
Subject: RE: [antlr-interest] Grammar help

(Brian slaps head again), "Duh!"  Sigh.  Sometimes, I really wonder whether
I'm overpaid ;-}

You fixed it!

Thank you very much for your help!!

 -Brian


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Bart Kiers
Sent: Tuesday, March 16, 2010 15:33
To: [email protected]
Subject: Re: [antlr-interest] Grammar help

On Tue, Mar 16, 2010 at 8:10 AM, Brian Catlin <[email protected]> wrote:

> While that gets rid of those warnings (why don't the warnings print a 
> reasonable line number?  I would call that a BUG),


Note that the '!' is a valid operator inside your grammar, ANTLR just
assumes that you're building trees. So, you're not doing anything wrong.
But, yes, a warning with the line number of the improper use of rewrite
operators would be nice.


 On Tue, Mar 16, 2010 at 8:10 AM, Brian Catlin <[email protected]> wrote:

> the fundamental problem
> of being able to parse (or otherwise capture the file name) still exists.
>
> Any ideas?
>

The error message is telling that your FILE_NAME is ambiguous. When matching
one or more characters from:

~('|' | '<' | '>' | '*' | '?')+

then line breaks will also be matched, yet after that, the following could
be matched:

('\r'? '\n')

which has already been "eaten" by the previous part of your rule. You could
fix that by adding line breaks to that first part of your rule, like this:

FILE_NAME    :  ~('|' | '<' | '>' | '*' | '?'| '\r' | '\n')+ (('\r'? '\n') |
EOF);

Regards,

Bart.

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en.

[il-antlr-interest: 28282] Re: [antlr-interest] Grammar help

Reply via email to