Hello Grammatica users, I want to write a parser for a language that allows nested comments: /* ... /* ... */ ... */ is valid but /* ... /* ... */ is not. Obviously, I cannot cover that with just a regular expression. I started by defining tokens similar to the following: (don't bother with correctness here ;-) )
COMMENT_START = "/*" COMMENT_END = "*/" NESTED_COMMENT_CONTENTS = << ... (ugly regexp matching anything except COMMENT_START or COMMENT_END) >> One big problem with this is that NESTED_COMMENT_CONTENTS, as intended, matches anything except COMMENT_START or COMMENT_END, which can be as much as all from the current position until the end of the input file! That changes the running time from close to O(n) to something like O(n^2) - 102 sec. on a 34k input file. Before NFAs were introduced to tokenize (Grammatica up to 1.5 alpha 2, if I'm right), my solution was to - add an "enabled" flag to the token patterns, and - hack the tokenizer to not match a token pattern that is not enabled, - to keep track of the number of COMMENT_START and COMMENT_END encountered, and - to enable NESTED_COMMENT_CONTENTS only when "inside" a comment. Since Grammatice 1.5 release, NESTED_COMMENT_CONTENTS is being recognized by the new NFA implementation where I cannot find an easy way to disable a token pattern. Any suggestions? Regards Oliver Oliver Gramberg ABB AG Forschungszentrum Deutschland DECRC/I2 Wallstadter Str. 59 D-68526 Ladenburg Phone: +49 6203/71-6461 Fax: +49 6203/71-6253 E-mail: oliver.gramb...@de.abb.com Sitz/Head Office: Mannheim Registergericht/Registry Court: Mannheim Handelsregisternummer/Commercial Register No.: HRB 4664 Vorstand/Managing Board: Peter Smits (Vorsitzender/Chairman), Heinz-Peter Paffenholz, Dr. Joachim Schneider, Hendrik Weiler Vorsitzender des Aufsichtsrats/Chairman of Supervisory Board: Bernhard Jucker Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
_______________________________________________ Grammatica-users mailing list Grammatica-users@nongnu.org http://lists.nongnu.org/mailman/listinfo/grammatica-users