Hello Grammatica users,

I want to write a parser for a language that allows nested comments:  /* 
... /* ... */ ... */ is valid but /* ... /* ... */ is not. Obviously, I 
cannot cover that with just a regular expression. I started by defining 
tokens similar to the following: (don't bother with correctness here ;-) )

COMMENT_START = "/*"
COMMENT_END = "*/"
NESTED_COMMENT_CONTENTS = << ... (ugly regexp matching anything except 
COMMENT_START or COMMENT_END) >>

One big problem with this is that NESTED_COMMENT_CONTENTS, as intended, 
matches anything except COMMENT_START or COMMENT_END, which can be as much 
as all from the current position until the end of the input file! That 
changes the running time from close to O(n) to something like O(n^2) - 102 
sec. on a 34k input file.

Before NFAs were introduced to tokenize (Grammatica up to 1.5 alpha 2, if 
I'm right), my solution was to
- add an "enabled" flag to the token patterns, and
- hack the tokenizer to not match a token pattern that is not enabled,
- to keep track of the number of COMMENT_START and COMMENT_END 
encountered, and
- to enable NESTED_COMMENT_CONTENTS only when "inside" a comment.

Since Grammatice 1.5 release, NESTED_COMMENT_CONTENTS is being recognized 
by the new NFA implementation where I cannot find an easy way to disable a 
token pattern.

Any suggestions?

Regards
Oliver


Oliver Gramberg
ABB AG
Forschungszentrum Deutschland
DECRC/I2
Wallstadter Str. 59
D-68526 Ladenburg
Phone: +49 6203/71-6461
Fax: +49 6203/71-6253
E-mail: oliver.gramb...@de.abb.com 
Sitz/Head Office: Mannheim
Registergericht/Registry Court: Mannheim
Handelsregisternummer/Commercial Register No.: HRB 4664 
Vorstand/Managing Board: Peter Smits (Vorsitzender/Chairman), Heinz-Peter 
Paffenholz, Dr. Joachim Schneider, Hendrik Weiler
Vorsitzender des Aufsichtsrats/Chairman of Supervisory Board: Bernhard 
Jucker 
Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte 
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und 
vernichten Sie diese Mail. 
Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist 
nicht gestattet. 
This e-mail may contain confidential and/or privileged information. If you 
are not the intended recipient (or have received this e-mail in error) 
please notify the sender immediately and destroy this e-mail.
Any unauthorized copying, disclosure or distribution of the material in 
this e-mail is strictly forbidden. 
_______________________________________________
Grammatica-users mailing list
Grammatica-users@nongnu.org
http://lists.nongnu.org/mailman/listinfo/grammatica-users

Reply via email to