Hi Max, Max Berger wrote: > Hi *, > > I just want to throw in a different idea (you may ignore it if you like): > > How about specifing the grammer and using a tool such as JavaCC to > generate the actual parser? This way you could focus more complete > grammer and have to spend less time writing the parser.
That would be the same as using ANTLR. I feel that this is a bit overkill for just parsing the font shorthand property, although that may prove to be useful for other properties that can accept complex expressions. That said, JavaCC is an interesting suggestion, I didn’t think of it. If a choice had to be made between ANTLR and JavaCC, which one would win? > JavaCC is BSD license, so we could easily integrate it in the fop build. > > Max Thanks, Vincent > 2009/9/28 Vincent Hennebert: >> Hi Jonathan, >> >> Interesting stuff! >> >> Jonathan Levinson wrote: >>> Hi Vincent, >>> >> <snip/> >>> Because font-variant font-style and font-weight can occur in any order, >>> I could not (currently) come up with a grammar in which the directing >>> sets were disjoint for each non-terminal. So I was unable to come up >>> with an LL(1) grammar. >>> >>> For instance, here are two productions of my attempt at a grammar: >>> >>> <style-variant-weight> -> <variant-weight> >>> >>> <style-variant-weight> -> <variant-style> >>> >>> In each case, the first set of <style-variant-weight> shares a common >>> element in two different productions, the literal values for variant. >>> One needs to look ahead one more token to see if one has a >>> <variant-weight> or a <variant-style>. >> (I’ll call “modifier” any of the three style, variant, weight >> properties.) >> Taking the ‘normal’ case apart, and since ‘inherit’ is not allowed in >> the shorthand, I think the values for all modifiers are distinct: >> ‘italic’, ‘oblique’, ‘backslant’ for font-style, ‘small-caps’ for >> font-variant, and the various weight values for font-weight. >> >> Since all modifiers are set to their initial values prior to the >> shorthand parsing, which is ‘normal’ for all three of them, I think we >> can simply ignore any ‘normal’ value found in the string. That is, >> accept it as a legal terminal but not do anything. >> >> So I don’t think there is any ambiguity any more. What remains to be >> done is to check that the same modifier is not specified more than once >> (that includes checking that ‘normal’ is not specified more than >> 3 times). And it’s probably easier to check that at the semantic level >> instead of crafting special grammar rules. >> >> >> <snip/> >>> The books and web articles I read only discussed using recursive descent >>> when the grammar is LL(1). I have the feeling that despite the >>> ambiguities in the grammar it is almost LL(k) because font-variant and >>> font-style and font-weight almost have disjoint values. It is at least >>> LL(3) and I suspect it is LL(6). >> The font-size property has the good idea of not allowing ‘normal’ as >> a value. The ‘normal’ case for modifiers can be ignored as explained >> above. So I think the grammar still is LL(1) >> >> >> <snip/> >>> I'm not as convinced as you are that recursive descent parsing or a >>> formal bottom-up-parser will make the code simpler rather than more >>> complex because of the complexities of a formal grammar. Of course, >>> however complex the grammar, a table-generating tool - like ANTLR - will >>> generate code, however complex, which will faithfully reflect the >>> inputted grammar. However, none of the other properties in FOP use a >>> table-generating tool like ANTLR - and I'm not sure what the >>> consequences would be to FOP of introducing such a tool. Given the >>> complexities of the grammar, I'm sure that a recursive descent parser >>> will be quite complex, and if we are going to use a grammar driven >>> approach we would be better off with a tool that generates parsers from >>> grammars rather than the recursive descent approach. Also an advantage >>> of parser generators is that one doesn't have to rewrite so much code to >>> correct a mistake in one's grammar, if one makes a mistake, or if the >>> grammar changes. Recursive descent parsing can pose its own maintenance >>> nightmares. >> Using a grammar tool like ANTLR is probably overkill to parse just >> a shorthand property. Moreover the grammar is not likely to change, so >> that reduces its usefulness even more. That said, most properties can >> accept expressions, where such a tool might actually be interesting. >> I don’t know how far FOP goes to supporting expressions in other >> properties. >> >> >>> The current approach in FOP for font-shorthand is obscurely written but >>> strikes me as basically sound. >>> >>> 1) One parses from right-to-left using the fact that spaces divide >>> tokens >> The problem is that font families can be specified with strings >> containing whitespace, that must be handled in a specific manner and not >> as a terminal delimitation. Otherwise parsing from right to left would >> indeed probably be relatively easy. >> >> >>> 2) One lets property makers determine whether they apply to a >>> token. Each property maker is a little parser of the token one feeds >>> it. Because the property makers determine whether they apply to a >>> token, one can handle the fact that variant, weight and style can occur >>> in any order by feeding the current token to each of the property makers >>> for font-variant, font-weight, and font-style in turn. Whatever they >>> accept is ipso-facto a font-variant or a font-weight or font-style. >>> >>> Just want to let you know I take the problem seriously, and I'm not >>> trying to duck the responsibility of coming up with an adequate >>> solution. I'm not sure what I did fits into a "job priority" which is >>> why I spent many hours this weekend on this research. >> Thanks for looking into this issue. I hope my comments above can help. >> >> As an alternative: the fact that the shorthand property can be parsed >> using a regular expression shows that its grammar is a regular >> language , thus can also be parsed using an automaton. I’ve quickly >> sketched such an automaton that I’ve attached to this message. The error >> handling probably needs to be refined but otherwise it should already >> provide a good basis for a parser. >> >>  http://en.wikipedia.org/wiki/Regular_language >> >> >>> You are free to disagree with my observations and I notice that on >>> fop-dev forums you challenge us to make the code simpler, more reusable, >>> and better structured. >> That’s indeed an important thing to keep in mind. I believe an LL parser >> would be relatively easy to implement. If you feel that the current >> approach could be simplified and improved to be as powerful as an LL >> parser, by all means, go for it. >> >> >> Thanks, >> Vincent >>