Hi Vincent,

Excellent ideas!  

The diagram you drew is extremely useful!

If the font shorthand sub-language has a grammar that is regular then it also 
has a grammar that is LL(1).  So recursive descent parsing will work, if there 
is a regular grammar.

I think the best way of getting font shorthand to work would proceed in stages:

1) First get the current code to properly parse and accept valid font shorthand 
expressions.  This should be very easy.  The one remaining problem (AFAIK) is 
the parsing of font-size/line-height where /line-height is optional.   
Currently spaces are not allowed around the slash "/" and they should be.  I'm 
going to try to get to this problem as soon as I have time, probably in a day 
or so.
2) Evaluate which parser or automaton approach is the simplest and produces 
better error states than the current approach.  
3) Implement the approach one has chosen in (2).

Best Regards,
Jonathan S. Levinson
Senior Software Developer
Object Group

-----Original Message-----
From: Vincent Hennebert [mailto:vhenneb...@gmail.com] 
Sent: Monday, September 28, 2009 8:13 AM
To: fop-dev@xmlgraphics.apache.org
Subject: Re: Questionable whether font-shorthand grammar LL(1)

Hi Jonathan,

Interesting stuff!

Jonathan Levinson wrote:
> Hi Vincent,
> Because font-variant font-style and font-weight can occur in any 
> order, I could not (currently) come up with a grammar in which the 
> directing sets were disjoint for each non-terminal.  So I was unable 
> to come up with an LL(1) grammar.
> For instance, here are two productions of my attempt at a grammar: 
> <style-variant-weight> -> <variant-weight>
> <style-variant-weight> -> <variant-style>
> In each case, the first set of <style-variant-weight> shares a common 
> element in two different productions, the literal values for variant.
> One needs to look ahead one more token to see if one has a 
> <variant-weight> or a <variant-style>.

(I’ll call “modifier” any of the three style, variant, weight
Taking the ‘normal’ case apart, and since ‘inherit’ is not allowed in the 
shorthand, I think the values for all modifiers are distinct:
‘italic’, ‘oblique’, ‘backslant’ for font-style, ‘small-caps’ for font-variant, 
and the various weight values for font-weight.

Since all modifiers are set to their initial values prior to the shorthand 
parsing, which is ‘normal’ for all three of them, I think we can simply ignore 
any ‘normal’ value found in the string. That is, accept it as a legal terminal 
but not do anything.

So I don’t think there is any ambiguity any more. What remains to be done is to 
check that the same modifier is not specified more than once (that includes 
checking that ‘normal’ is not specified more than
3 times). And it’s probably easier to check that at the semantic level instead 
of crafting special grammar rules.

> The books and web articles I read only discussed using recursive 
> descent when the grammar is LL(1).  I have the feeling that despite 
> the ambiguities in the grammar it is almost LL(k) because font-variant and
> font-style and font-weight almost have disjoint values.   It is at least
> LL(3) and I suspect it is LL(6).

The font-size property has the good idea of not allowing ‘normal’ as a value. 
The ‘normal’ case for modifiers can be ignored as explained above. So I think 
the grammar still is LL(1)

> I'm not as convinced as you are that recursive descent parsing or a 
> formal bottom-up-parser will make the code simpler rather than more
> complex because of the complexities of a formal grammar.   Of course,
> however complex the grammar, a table-generating tool - like ANTLR - 
> will generate code, however complex, which will faithfully reflect the 
> inputted grammar.  However, none of the other properties in FOP use a 
> table-generating tool like ANTLR - and I'm not sure what the 
> consequences would be to FOP of introducing such a tool.  Given the 
> complexities of the grammar, I'm sure that a recursive descent parser 
> will be quite complex, and if we are going to use a grammar driven 
> approach we would be better off with a tool that generates parsers 
> from grammars rather than the recursive descent approach.  Also an 
> advantage of parser generators is that one doesn't have to rewrite so 
> much code to correct a mistake in one's grammar, if one makes a 
> mistake, or if the grammar changes.  Recursive descent parsing can 
> pose its own maintenance nightmares.

Using a grammar tool like ANTLR is probably overkill to parse just a shorthand 
property. Moreover the grammar is not likely to change, so that reduces its 
usefulness even more. That said, most properties can accept expressions, where 
such a tool might actually be interesting.
I don’t know how far FOP goes to supporting expressions in other properties.

> The current approach in FOP for font-shorthand is obscurely written 
> but strikes me as basically sound.
> 1)      One parses from right-to-left using the fact that spaces divide
> tokens

The problem is that font families can be specified with strings containing 
whitespace, that must be handled in a specific manner and not as a terminal 
delimitation. Otherwise parsing from right to left would indeed probably be 
relatively easy.

> 2)      One lets property makers determine whether they apply to a
> token.  Each property maker is a little parser of the token one feeds 
> it.  Because the property makers determine whether they apply to a 
> token, one can handle the fact that variant, weight and style can 
> occur in any order by feeding the current token to each of the 
> property makers for font-variant, font-weight, and font-style in turn.  
> Whatever they accept is ipso-facto a font-variant or a font-weight or 
> font-style.
> Just want to let you know I take the problem seriously, and I'm not 
> trying to duck the responsibility of coming up with an adequate 
> solution.  I'm not sure what I did fits into a "job priority" which is 
> why I spent many hours this weekend on this research.

Thanks for looking into this issue. I hope my comments above can help.

As an alternative: the fact that the shorthand property can be parsed using a 
regular expression shows that its grammar is a regular language [1], thus can 
also be parsed using an automaton. I’ve quickly sketched such an automaton that 
I’ve attached to this message. The error handling probably needs to be refined 
but otherwise it should already provide a good basis for a parser.

[1] http://en.wikipedia.org/wiki/Regular_language

> You are free to disagree with my observations and I notice that on 
> fop-dev forums you challenge us to make the code simpler, more 
> reusable, and better structured.

That’s indeed an important thing to keep in mind. I believe an LL parser would 
be relatively easy to implement. If you feel that the current approach could be 
simplified and improved to be as powerful as an LL parser, by all means, go for 


Reply via email to