>> But a more robust and forgiving parser is needed. > > Well at least it would be nice if it could report errors, but that's a > pain to do with LR parsers.
SmaCC can report errors. I guess for Citezen something that can silently parse across errors is needed. Something that builds an as complete as possible model from input that is not necessarily entirely valid. I suggest that you have a look at the PetitParser PEG framework (http://source.lukas-renggli.ch/petit.html). It uses an object-oriented approach, pure Smalltalk syntax, and comes with an extensive test suite that covers 100% of the code. Ambiguous grammars are supported and you can speed them up using memoization if you want to. > I think I have an OMeta attempt somewhere but it will probably be > slower than SmaCC. Parsing all methods of Object and Morph (1390 methods in total) takes: 688ms with the hand written RBParser, 806ms with the hand written Squeak parser, 2518ms with the pre-compiled and heavy optimized SmaCC parser, 3700ms with the not optimized PetitParser Smalltalk parser I don't know where OMeta would be in this comparison, unfortunately it does only include a parser for Smalltalk expressions. A probably less accurate comparison with just a single expression parser (the factorial function) parsed 1000 times gives: 560ms with the hand written RBParser, 602ms with the hand written Squeak parser, 2564ms with the pre-compiled and heavy optimized SmaCC parser, 4867ms with the not optimized PetitParser Smalltalk parser, 25098ms with the OMeta Smalltalk expression parser I have no idea why OMeta is so slow? Otherwise however, I conclude that it doesn't matter much speed-wise what kind of parser you pick. LR parser are probably not that much of a hype anymore ;-) Cheers, Lukas -- Lukas Renggli http://www.lukas-renggli.ch _______________________________________________ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project