2010/7/11 Kevin Brubeck Unhammer <[email protected]>: > 2010/7/11 Jimmy O'Regan <[email protected]>: >> The attached patch adds a new mechanism to transfer rules: <exception> > > This has been on my wishlist for a while =D > >> Exception can contain a single <test> -- if the test evaluates to >> 'true', the current rule is ignored, and the last applicable rule is >> used instead (the implication being that it should only be used in >> rules whose <pattern> contains more than one <pattern-item>). >> > [snip] >> Motivation: >> >> The primary motivation was in dealing with Polish: highly inflected >> (few 'markers'), adjectives can come before or after the noun. >> Inflection *usually* gives enough information for proper segmentation, >> but handling it properly would be a matter of having individual rules >> for each gender, case, and number + each combination of words (i.e., >> multiply number of NP rules by 70). I've seen recently that it would >> help in less inflected languages, so it's probably generally useful. > > I just tested it for nb->nn, where I used it to avoid chunking adj.ind > n.def (the adjective is used adverbially, not modifying the noun), > which in some cases can be quite important: > > Before: > $ echo Ledelsen liker dårlig fokuset på utøvere som Tommy > Ingebrigtsen|apertium -d . nb-nn > Leiinga likar det dårlege fokuset på utøvarar som Tommy Ingebrigtsen > ≈ The management likes the bad focus on athletes such as Tommy Ingebrigtsen > > After, correct meaning: > $ echo Ledelsen liker dårlig fokuset på utøvere som Tommy > Ingebrigtsen|apertium -d . nb-nn > Leiinga likar dårleg fokuset på utøvarar som Tommy Ingebrigtsen > ≈ The management doesn't like ("likes badly") the focus on athletes > such as Tommy Ingebrigtsen > > > Of course, one can always acheive the same as <exception> by using > <choose><when> and duplicating the contents of the single-item rules, > but, well, that means duplicating content… this looks like it would be > a lot simpler to maintain (and less ugly than output macros).
I've noticed a lot more rules that all could do with this <exception>, at least a fifth of the sme-nob chunking rules have possibilities for mis-chunking (eg. det.loc + n.ill should not be chunked, but most other cases of det and n should be chunked), the same for all the conjunction rules (in the first interchunk) that merge two chunks. In the above example I could have just added extra almost-identical rules to cover all the patterns (involving a lot of redundancy), but if the exception depends on target-language information even that wouldn't do it. Eg. most verbs both in Bokmål and Sámi have adjective forms, so we allow Sámi <v><adj> to enter into ADJ NOM rules. But some Sámi verbs translate to a certain class of Bokmål verbs (lexicalised passives) that don't have adj forms, these get the tag <pstv> in bidix, but we can't know that from the <pattern>; here the <exception> would be great. -Kevin ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
