2010/7/11 Kevin Brubeck Unhammer <[email protected]>:
> 2010/7/11 Jimmy O'Regan <[email protected]>:
>> The attached patch adds a new mechanism to transfer rules: <exception>
>
> This has been on my wishlist for a while =D
>
>> Exception can contain a single <test> -- if the test evaluates to
>> 'true', the current rule is ignored, and the last applicable rule is
>> used instead (the implication being that it should only be used in
>> rules whose <pattern> contains more than one <pattern-item>).
>>
> [snip]
>> Motivation:
>>
>> The primary motivation was in dealing with Polish: highly inflected
>> (few 'markers'), adjectives can come before or after the noun.
>> Inflection *usually* gives enough information for proper segmentation,
>> but handling it properly would be a matter of having individual rules
>> for each gender, case, and number + each combination of words (i.e.,
>> multiply number of NP rules by 70). I've seen recently that it would
>> help in less inflected languages, so it's probably generally useful.
>
> I just tested it for nb->nn, where I used it to avoid chunking adj.ind
> n.def (the adjective is used adverbially, not modifying the noun),
> which in some cases can be quite important:
>
> Before:
> $ echo Ledelsen liker dårlig fokuset på utøvere som Tommy
> Ingebrigtsen|apertium -d . nb-nn
> Leiinga likar det dårlege fokuset på utøvarar som Tommy Ingebrigtsen
> ≈ The management likes the bad focus on athletes such as Tommy Ingebrigtsen
>
> After, correct meaning:
> $ echo Ledelsen liker dårlig fokuset på utøvere som Tommy
> Ingebrigtsen|apertium -d . nb-nn
> Leiinga likar dårleg fokuset på utøvarar som Tommy Ingebrigtsen
> ≈ The management doesn't like ("likes badly") the focus on athletes
> such as Tommy Ingebrigtsen
>
>
> Of course, one can always acheive the same as <exception> by using
> <choose><when> and duplicating the contents of the single-item rules,
> but, well, that means duplicating content… this looks like it would be
> a lot simpler to maintain (and less ugly than output macros).

I've noticed a lot more rules that all could do with this <exception>,
at least a fifth of the sme-nob chunking rules have possibilities for
mis-chunking (eg. det.loc + n.ill should not be chunked, but most
other cases of det and n should be chunked), the same for all the
conjunction rules (in the first interchunk) that merge two chunks.

In the above example I could have just added extra almost-identical
rules to cover all the patterns (involving a lot of redundancy), but
if the exception depends on target-language information even that
wouldn't do it. Eg. most verbs both in Bokmål and Sámi have adjective
forms, so we allow Sámi <v><adj> to enter into ADJ NOM rules. But some
Sámi verbs translate to a certain class of Bokmål verbs (lexicalised
passives) that don't have adj forms, these get the tag <pstv> in
bidix, but we can't know that from the <pattern>; here the <exception>
would be great.


-Kevin

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to