On 04-07-2013 11:41, Dávid Nemeskey wrote:
> The words /set/ and /list/ are used interchangeably in CG. This is in
> contrast to how these term are used in CS, and partly to the
> commonsensical meanings of the words as well. The current planning
> process might be just the right time to fix this issue. I propose to
> say good-bye to /list/.
I agree - you only need SET. I would love to remove LIST from CG-3, but
that is simply not possible in the current plain text format. In XML,
it's trivial.
> *|<tag>|*/nom/*|</tag>|*|vs |*|<tag n="|*/|nom|/*|"/>|*
While <tag>nom</tag> is the most correct as per XML, the most readable
is <tag n="nom"/> and shorter. Alternatively, <t>nom</t> or <t n="nom"/>
since the fact that it's a tag is clear from context and DTD.
> I don't know if we even need *|set|* -- in the construction rules, you
> have to put sets to everywhere, and those will have separate XML tags
> anyway.
Why differentiate? It should be the same <set> DTD anywhere it is used,
so simply allow it everywhere, inline or not.
> *|<set>|*<or><tag n="/n/"/><tag n="/adj/"/></or>*|</set>|*
I would default to <or> to allow for cleaner <set><t n="n"/><t
n="adj"/></set>
> |*<sets>*...*</sets> vs <constraints>...</constraints>
> *|
I would allow <set> anywhere that <rule/remove/select/iff> is allowed as
CG-3 does it - it makes it easier for the grammarian to keep sets in the
context they're used in, and does not in any way hinder parsing the grammar.
Also, you need to figure out whether mapping operations may be done in
the same section as selection rules. In CG-3, I renamed constraints to
section because I could easily allow any rule type anywhere so
constraints didn't describe what was going on.
Does the FST implementation allow for arbitrary tag addition while
retaining speed? If so, I suggest going for CG-3 terminology for sections.
> |
> <condition><cond/><link/></condition>
> |
If you're going to group them by <condition> why have different tags for
<cond> and <link>? The order matters, so make use of it and just have
<cond/><cond/><cond/>.
> |
> <barrier>
> |
I would go with <cond><set/><barrier/></cond> as the barrier belongs to
the context test. Explicit <tgt> can easily be optional. Likewise,
<barrier>setname</barrier> can easily be shorthand for <barrier><set
n="setname"/></barrier>.
As for the round-trip concern, this is all fully round-trippable. That
XML only has <set> means little, as it is quite easy to know whether it
would be representable as a LIST in CG.
But it shouldn't really matter - the plain text CG format is terribly
limiting, so eventually the XML format will be more powerful. E.g., CG-3
can internally handle many more fancy constructs that the plain text
parser simply cannot express.
In general, it looks good. Do consider how much can be made optional and
shorthandable, 'cause I see a lot of possibilities for shorter
expressions of equal power.
-- Tino Didriksen
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff