Grammatica currently doesn't support productions that match a null input. Hence it cannot handle grammars that match an empty input. Normally this isn't such a big deal, but it might of course be annoying. The reason for this was to make the production look-ahead handling manageable.
Good point regarding the ordering of the productions. I think it is written down somewhere, but obviously not in the right place as I couldn't find it in the reference manual just now... :-( It would be great if you create a new version with the features you mention. It is a much more ambitious goal than the current Grammatica though, as you are then entering into the domains of context sensitive grammars. I created Grammatica as a pretty standard context *in*sensitive LL(k) parser generator as I know the theory behind that. Going context sensitive results in a whole new complexity with regard to the ambiguity handling, as this example illustrates: %tokens% TOKENX = "x" LETTER = <<a-z>> %productions% A = B | C ; B = "x" LETTER ; C = LETTER "x" ; Is this an ambiguity? For the input "xx", how do one choose between B or C? I opted out of this and created a normal LL(k) parser instead. That said, extending the Tokenizer class to return all the token matches from a specified position would of course be a nice generalization and not too tricky to pull off. The current class only returns the longest one (or the first one if two have the same length). From that point one can then start to experiment with error recovery or context sensitivity in the parser without having to support full context sensitivity everywhere. Good luck with the project! And do let me know when you make any progress. I'd love to merge back good ideas and solid implementations into Grammatica. Cheers, /Per On wed, 2005-06-29 at 22:13 +0300, Matti Katila wrote: > Hello, > > First let me introduce you a simple grammar: > > %tokens% > letter = <<[a-z]>> > digit = <<[0-9]>> > b = "b" > > %productions% > S = letter ( letter | digit )* > > > First question: How to define a production for empty file? It seems that > there needs to be at least one character or error is thrown since > non-empty production is illegal. Also, was it documented that first > pattern is used as a startPattern? > > Back to example grammar. With input "acd" all works fine, but with input > "abc" an error is thrown since "b" is not expected token. You may wonder > of course why to create such a "b" there but if we add another production > like: > > bXb = "b" letter "b", > > we win in readability which I have said to be my high consern already. > > To make grammatica even slower I will branch current version into darcs > repository and will contribute a patch witch fixes this tokenizer design > "flaw". Current design doesn't care what tokens the current production is > looking for but tries it best to give a sequense of tokens what tokenizer > thinks as best. Darcs supports distributed developing and it is easy for > others to pick up my patches for grammatica if you are interested in those. > > > -Matti _______________________________________________ Grammatica-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/grammatica-users
