Hi Vincent, Excellent ideas!
The diagram you drew is extremely useful! If the font shorthand sub-language has a grammar that is regular then it also has a grammar that is LL(1). So recursive descent parsing will work, if there is a regular grammar. I think the best way of getting font shorthand to work would proceed in stages: 1) First get the current code to properly parse and accept valid font shorthand expressions. This should be very easy. The one remaining problem (AFAIK) is the parsing of font-size/line-height where /line-height is optional. Currently spaces are not allowed around the slash "/" and they should be. I'm going to try to get to this problem as soon as I have time, probably in a day or so. 2) Evaluate which parser or automaton approach is the simplest and produces better error states than the current approach. 3) Implement the approach one has chosen in (2). Best Regards, Jonathan S. Levinson Senior Software Developer Object Group InterSystems 617-621-0600 -----Original Message----- From: Vincent Hennebert [mailto:vhenneb...@gmail.com] Sent: Monday, September 28, 2009 8:13 AM To: email@example.com Subject: Re: Questionable whether font-shorthand grammar LL(1) Hi Jonathan, Interesting stuff! Jonathan Levinson wrote: > Hi Vincent, > <snip/> > > Because font-variant font-style and font-weight can occur in any > order, I could not (currently) come up with a grammar in which the > directing sets were disjoint for each non-terminal. So I was unable > to come up with an LL(1) grammar. > > For instance, here are two productions of my attempt at a grammar: > > <style-variant-weight> -> <variant-weight> > > <style-variant-weight> -> <variant-style> > > In each case, the first set of <style-variant-weight> shares a common > element in two different productions, the literal values for variant. > One needs to look ahead one more token to see if one has a > <variant-weight> or a <variant-style>. (I’ll call “modifier” any of the three style, variant, weight properties.) Taking the ‘normal’ case apart, and since ‘inherit’ is not allowed in the shorthand, I think the values for all modifiers are distinct: ‘italic’, ‘oblique’, ‘backslant’ for font-style, ‘small-caps’ for font-variant, and the various weight values for font-weight. Since all modifiers are set to their initial values prior to the shorthand parsing, which is ‘normal’ for all three of them, I think we can simply ignore any ‘normal’ value found in the string. That is, accept it as a legal terminal but not do anything. So I don’t think there is any ambiguity any more. What remains to be done is to check that the same modifier is not specified more than once (that includes checking that ‘normal’ is not specified more than 3 times). And it’s probably easier to check that at the semantic level instead of crafting special grammar rules. <snip/> > The books and web articles I read only discussed using recursive > descent when the grammar is LL(1). I have the feeling that despite > the ambiguities in the grammar it is almost LL(k) because font-variant and > font-style and font-weight almost have disjoint values. It is at least > LL(3) and I suspect it is LL(6). The font-size property has the good idea of not allowing ‘normal’ as a value. The ‘normal’ case for modifiers can be ignored as explained above. So I think the grammar still is LL(1) <snip/> > I'm not as convinced as you are that recursive descent parsing or a > formal bottom-up-parser will make the code simpler rather than more > complex because of the complexities of a formal grammar. Of course, > however complex the grammar, a table-generating tool - like ANTLR - > will generate code, however complex, which will faithfully reflect the > inputted grammar. However, none of the other properties in FOP use a > table-generating tool like ANTLR - and I'm not sure what the > consequences would be to FOP of introducing such a tool. Given the > complexities of the grammar, I'm sure that a recursive descent parser > will be quite complex, and if we are going to use a grammar driven > approach we would be better off with a tool that generates parsers > from grammars rather than the recursive descent approach. Also an > advantage of parser generators is that one doesn't have to rewrite so > much code to correct a mistake in one's grammar, if one makes a > mistake, or if the grammar changes. Recursive descent parsing can > pose its own maintenance nightmares. Using a grammar tool like ANTLR is probably overkill to parse just a shorthand property. Moreover the grammar is not likely to change, so that reduces its usefulness even more. That said, most properties can accept expressions, where such a tool might actually be interesting. I don’t know how far FOP goes to supporting expressions in other properties. > The current approach in FOP for font-shorthand is obscurely written > but strikes me as basically sound. > > 1) One parses from right-to-left using the fact that spaces divide > tokens The problem is that font families can be specified with strings containing whitespace, that must be handled in a specific manner and not as a terminal delimitation. Otherwise parsing from right to left would indeed probably be relatively easy. > 2) One lets property makers determine whether they apply to a > token. Each property maker is a little parser of the token one feeds > it. Because the property makers determine whether they apply to a > token, one can handle the fact that variant, weight and style can > occur in any order by feeding the current token to each of the > property makers for font-variant, font-weight, and font-style in turn. > Whatever they accept is ipso-facto a font-variant or a font-weight or > font-style. > > Just want to let you know I take the problem seriously, and I'm not > trying to duck the responsibility of coming up with an adequate > solution. I'm not sure what I did fits into a "job priority" which is > why I spent many hours this weekend on this research. Thanks for looking into this issue. I hope my comments above can help. As an alternative: the fact that the shorthand property can be parsed using a regular expression shows that its grammar is a regular language , thus can also be parsed using an automaton. I’ve quickly sketched such an automaton that I’ve attached to this message. The error handling probably needs to be refined but otherwise it should already provide a good basis for a parser.  http://en.wikipedia.org/wiki/Regular_language > You are free to disagree with my observations and I notice that on > fop-dev forums you challenge us to make the code simpler, more > reusable, and better structured. That’s indeed an important thing to keep in mind. I believe an LL parser would be relatively easy to implement. If you feel that the current approach could be simplified and improved to be as powerful as an LL parser, by all means, go for it. Thanks, Vincent