Re: ambiguity of grammar for font shorthand?

Vincent Hennebert Tue, 22 Sep 2009 04:20:57 -0700

Hi Jonathan,

Jonathan Levinson wrote:
> Hi Vincent,
> 
>  
> 
> As I read the grammar for the font shorthand it is ambiguous, though not
> fatally so as long as one excludes the value of "inherit" from
> individual properties in the font short hand.
> 
>  
> 
>  For instance the first optional argument is font-style, font-weight,
> and font-variant, each of which is optional and can occur in any order.
> All can have the value normal.   So if the value for the font shorthand
> is "normal 10pt Arial" we do not know which of these three is being set
> to normal even though it is harmless and the omitted values will be set
> to normal since that is their initial value.


Actually not: the default value is inherited. If somewhere up in the
hierarchy the font-weight was set to bold, then we don’t know if that
‘normal’ in the font property means that font-weight must be reset to
normal or if it applies to another property. This example you’re
mentioning is truly ambiguous.


> If inherit is allowed to be a value then the grammar truly becomes
> ambiguous since each of these can have the value inherit and we don't
> know which ones are omitted and must take the value normal.
> 
>  
> 
> I think it is probably the case that in the context of the font short
> hand - the font properties cannot take the value of inherit, since this
> renders the grammar irreducibly ambiguous.  While such an exclusion is
> not mentioned in the spec,  it makes sense that inherit must be excluded
> for the reason I've just given.

Excluding inherit for good is a bit too restrictive IMO. I think we
should try to resolve all non-ambiguous cases, like:
normal normal bold
inherit bold italic
inherit inherit inherit
inherit
etc.

Some truly ambiguous values:
normal normal (which one is inherited?)
normal bold inherit (which one is normal, which one inherited?)
normal (which one is normal, which one inherited?)
etc.

A good “exercise” would be to identify all cases that are ambiguous. In
which case an error would be thrown with a “the value is ambiguous”-like
message.


> Prima facie, the grammar (eliminating inherit) looks LL(1) since 
> parsing
> from left to right one can always tell what property one is parsing
> except for the case when one of the first three is assigned normal and
> there are no further values unique to the properties of the first three.
> In this case, one has a special rule (outside the grammar) to
> arbitrarily pick one of the optional properties in the first optional
> argument as the bearer of normal, while the rest receive their initial
> values of normal.

Actually, a “simple” regular expression might be enough. The
java.util.regex package can do wonder. See attached Java file: there
will always be 6 matching groups, some of them possibly being null. The
first three are for style/variant/weight, then font-size, then
line-height, then font families. Some magic would have to be implemented
to identify the first 3 groups. Also, the regex for the individual
properties would have to be refined: “\\w+” is actually wrong for
font-weight. One could imagine to re-use a regex defined for each
sub-property.

However, an LL parser would probably be superior in error handling. The
regular expression would just fail to match, and there’s not much that
can be said about why it fails. An LL parser would probably be able to
tell, say, that the error lies in the declaration of the font-size
property.

I think a good error handling is important, especially to beginners.
I’ve found myself ranting against such meaningless error messages that
don’t tell you at all what your error could be.


> There is a special case where the value of font is inherit and that
> works fine.  Since we are testing if the single token is inherit, we can
> handle that special case in a recursive descent parser.   We create a
> tokenizer which breaks on space and see if the one token returned is
> inherit.
> 
>  
> 
> Also, in your message you said we could ignore a value for font of
> caption, icon, etc., as the standard tells us to do, but the standard
> discusses these values and their relation to system fonts.  Was this an
> oversight on your part or am I mis-reading the spec? [1]

See Alexander’s answer about that.


> I'm not sure we have to go to the complexity of parsing the font short
> hand in a recursive descent manner.  I've updated the open issue (47709)
> to give my reasons why and a solution to the problem of more than two
> fonts separated by commas.  The overly complex code I analyzed looks to
> me like a tokenizer not a parser, and while it could be better written
> (and more understandable) it seems to be doing an adequate job of
> tokenizing, unless I'm still missing something.

There are still cases that it doesn’t handle: spaces around the slash,
different order in style/variant/weight.


Vincent

/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package tests;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class FontPropertyParser {

    private final Pattern pattern;

    private FontPropertyParser() {
        StringBuilder regex = new StringBuilder();
        regex.append("^");
        // font-style/-variant/-weight
        String fontParam = "(?:(\\w+)\\s+)?";
        regex.append(fontParam).append(fontParam).append(fontParam);
        // font-size
        String size = "((?:\\w+|\\d+(?:\\w+|%)))";
        regex.append(size);
        // line-height
        regex.append("(?:\\s*/\\s*").append(size).append(")?");
        regex.append("\\s+");
        // font families
        String familyWithSpaces = "\\w(?:\\w|\\s)*\\w";
        String fontFamily = "(?:\\w+|'" + familyWithSpaces + "'|\"" + familyWithSpaces + "\")";
        regex.append("(").append(fontFamily).append("(?:\\s*,\\s*").append(fontFamily).append(")*)");
        regex.append('$');
        pattern = Pattern.compile(regex.toString());
    }

    private void match(String s) {
        Matcher m = pattern.matcher(s.trim());
        if (!m.matches()) {
            System.out.println("No match: " + s);
        } else {
            System.out.println("Match: " + m.group(0));
            System.out.println("  " + m.groupCount() + " group(s):");
            for (int i = 1; i <= m.groupCount(); i++) {
                System.out.println("  " + m.group(i));
            }
            System.out.println();
        }
    }

    public static void main(String[] args) {
        FontPropertyParser parser = new FontPropertyParser();
        parser.match("12pt \"DejaVu Sans\"");
        parser.match("12pt 'DejaVu Sans'");
        parser.match("12pt DejaVuSans");
        parser.match(" 12pt \"DejaVu Sans\" , SansSerif  ");
        parser.match("12pt DejaVuSans, 'Sans Serif'");
        parser.match("normal inherit 12pt / 14pt SansSerif, 'DejaVu Sans'");
        parser.match("bold normal inherit 12pt/14pt  SansSerif, 'DejaVu Sans'");
        parser.match("inherit inherit inherit 12pt  SansSerif, 'DejaVu Sans'");
        parser.match("inherit inherit");
    }
}

Re: ambiguity of grammar for font shorthand?

Reply via email to