El dj 08 de 11 de 2012 a les 11:06 +0100, en/na Kevin Brubeck Unhammer
va escriure:
> Per Tunedal <[email protected]>
> writes:
> 
> [...]
> 
> > The noun "kjempe" is advertised as possible to use in compounds, yet
> > there is an entry for the adjective "kjempehøy" (= very high/tall). Why?
> 
> Assume you have dynamic[1] compounding turned on for the open classes
> nouns, verbs, adjectives – these are all fairly common in compounding
> (though nouns cover over 70 % in nn/nb), and you remove "kjempehøy" from
> your dictionary.
> 
> Now, since nb.dix has these analysis of "kjempe" and "høy":
> 
>     
> kjempe<vblex><inf>/kjempe<n><m><sg><ind>/kjempe<n><f><sg><ind>/kjempe<n><m><sg><ind>/kjempe<n><f><sg><ind>
>     
> høye<vblex><imp>/høy<n><nt><sg><ind>/høy<n><nt><pl><ind>/høy<adj><posi><mf><sg><ind>
> 
> your compound analysis will be ambiguous over at least:
> 
>     kjempe<n><f><sg><ind>+høy<n><nt><pl><ind>
>     kjempe<n><f><sg><ind>+høy<n><nt><sg><ind>
>     kjempe<n><f><sg><ind>+høye<vblex><imp>
>     kjempe<n><f><sg><ind>+høy<adj><posi><mf><sg><ind>
>     kjempe<n><m><sg><ind>+høy<n><nt><sg><ind>
>     kjempe<n><m><sg><ind>+høy<n><nt><pl><ind>
>     kjempe<n><m><sg><ind>+høye<vblex><imp>
>     kjempe<n><m><sg><ind>+høy<adj><posi><mf><sg><ind>
>     kjempe<vblex><inf>+høy<n><nt><pl><ind>
>     kjempe<vblex><inf>+høy<n><nt><sg><ind>
>     kjempe<vblex><inf>+høye<vblex><imp>
>     kjempe<vblex><inf>+høy<adj><posi><mf><sg><ind>
> 
> And it gets even worse if there's some possibility of segmenting at the
> pwrong place, e.g. Bokmål 'te+skje' (tea+spoon) could be mis-segmented
> 'te+s+kje' (tea+epenthetic+kid goat), similarly 'bilde+liste'
> (image+list) vs 'bildel+iste' (image+iced/image+ice tea).
> 
> Compare this with the ambiguity-count of the analysis given when we do
> have "kjempehøy" in the dictionary:
> 
> kjempehøy<adj><posi><mf><sg><ind>
> 
> Only one analysis, and it's the correct one. 
> 
> So you avoid useless ambiguity by adding more compounds. Useless
> ambiguity is harmful not only to the translation of that word, but of
> the context (given the seqence "<adj> <vblex>/<n>", it's easy to see
> that the second word is most likely a noun, not so with
> "<adj>/<n>/<vblex> <vblex>/<n>").
> 
> 
> In addition to all that, a decompounding analysis takes a lot longer per
> word than a simple analysis (you have to check all the possible ways of
> segmenting the word into two parts, then three parts, etc.), and the
> fact that adding full compound words further helps decompounding
> compounds of compounds (it's safer and faster to segment
> 'bildeliste+generator' than 'bilde+liste+generator', where you might end
> up with 'bildel+iste+generator').
> 
> Aaand, finally, some times the sum is greater than the parts, e.g.
> Bokmål 'kjempemessig' might be better translated to 'ovstor' or 'diger'
> in Nynorsk, 'bedømmelseskommité'→'domsnemnd' etc.
> 
> 
> In summary: Dynamic compounding leads to more ambiguity and slower
> analysis, and is thus used only when there is no lexicalised analysis.
> Adding lexicalised compounds improves not only analysis of those
> compounds and their contexts, but also improves dynamic compounding of
> longer compounds.
> 
> 
> > BTW I've found only one similar Danish word: "kæmpestor" (very large). I
> > don't know if there are any more.
> 
> If "kæmpe-" is not very productive in Danish, it might be better to
> translate those words into something else (kjempelett→pærelet,
> kjempegod→knippelgod?). Adding such pairs as lexicalised compounds in
> the dictionaries will override dynamic compounding for those words.
> 
> 
> 
> [1] Dynamic compounding is when the analyser only contains the parts and
>     guesses how they fit together, lexicalised compounds are defined as
>     those we spell out completely in the dictionary.

Tour de force. This should be on the Wiki!

Fran



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to