We've been trying to build a high-performance yet accurate CSS parser using Antlr for the last few months.
To date, our efforts have yielded accuracy, but not performance. The main problem with CSS is what's called the CSS parsing conventions <http://www.w3.org/TR/CSS21/syndata.html#parsing-errors> , or how to correctly handle parse errors. There is a core syntax <http://www.w3.org/TR/CSS21/syndata.html#tokenization> that all versions of CSS use. Conceptually, to parse say CSS2.1, we first parse the file according to the core syntax, and then flesh out the parse tree with the CSS2.1 grammar. The core syntax causes the right things to happen when invalid tokens are seen. We implemented it this way - see this stackoverflow question: http://stackoverflow.com/questions/5437835/parsing-css-2-1-with-the-correct- css-parsing-conventions-in-antlr. However, this double parsing creates a new instance of the CSS2.1 parser for each successfully parsed piece of the core grammar. This results in extremely slow parse times. We also tried rewriting the input stream and adding custom terminators around each piece parsed by the CSS core grammar, and feeding the result in its entirety to the CSS2.1 parser (augmented with rules for the custom terminators), but this turned out to be even slower. Is there a way to do better than this in Antlr? ( At this point, we're considering writing a hand-coded recursive descent parser, hopefully there is a better way with Antlr J Regards, Vivek List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address -- You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en.
