El dj 08 de 11 de 2012 a les 11:06 +0100, en/na Kevin Brubeck Unhammer va escriure: > Per Tunedal <[email protected]> > writes: > > [...] > > > The noun "kjempe" is advertised as possible to use in compounds, yet > > there is an entry for the adjective "kjempehøy" (= very high/tall). Why? > > Assume you have dynamic[1] compounding turned on for the open classes > nouns, verbs, adjectives – these are all fairly common in compounding > (though nouns cover over 70 % in nn/nb), and you remove "kjempehøy" from > your dictionary. > > Now, since nb.dix has these analysis of "kjempe" and "høy": > > > kjempe<vblex><inf>/kjempe<n><m><sg><ind>/kjempe<n><f><sg><ind>/kjempe<n><m><sg><ind>/kjempe<n><f><sg><ind> > > høye<vblex><imp>/høy<n><nt><sg><ind>/høy<n><nt><pl><ind>/høy<adj><posi><mf><sg><ind> > > your compound analysis will be ambiguous over at least: > > kjempe<n><f><sg><ind>+høy<n><nt><pl><ind> > kjempe<n><f><sg><ind>+høy<n><nt><sg><ind> > kjempe<n><f><sg><ind>+høye<vblex><imp> > kjempe<n><f><sg><ind>+høy<adj><posi><mf><sg><ind> > kjempe<n><m><sg><ind>+høy<n><nt><sg><ind> > kjempe<n><m><sg><ind>+høy<n><nt><pl><ind> > kjempe<n><m><sg><ind>+høye<vblex><imp> > kjempe<n><m><sg><ind>+høy<adj><posi><mf><sg><ind> > kjempe<vblex><inf>+høy<n><nt><pl><ind> > kjempe<vblex><inf>+høy<n><nt><sg><ind> > kjempe<vblex><inf>+høye<vblex><imp> > kjempe<vblex><inf>+høy<adj><posi><mf><sg><ind> > > And it gets even worse if there's some possibility of segmenting at the > pwrong place, e.g. Bokmål 'te+skje' (tea+spoon) could be mis-segmented > 'te+s+kje' (tea+epenthetic+kid goat), similarly 'bilde+liste' > (image+list) vs 'bildel+iste' (image+iced/image+ice tea). > > Compare this with the ambiguity-count of the analysis given when we do > have "kjempehøy" in the dictionary: > > kjempehøy<adj><posi><mf><sg><ind> > > Only one analysis, and it's the correct one. > > So you avoid useless ambiguity by adding more compounds. Useless > ambiguity is harmful not only to the translation of that word, but of > the context (given the seqence "<adj> <vblex>/<n>", it's easy to see > that the second word is most likely a noun, not so with > "<adj>/<n>/<vblex> <vblex>/<n>"). > > > In addition to all that, a decompounding analysis takes a lot longer per > word than a simple analysis (you have to check all the possible ways of > segmenting the word into two parts, then three parts, etc.), and the > fact that adding full compound words further helps decompounding > compounds of compounds (it's safer and faster to segment > 'bildeliste+generator' than 'bilde+liste+generator', where you might end > up with 'bildel+iste+generator'). > > Aaand, finally, some times the sum is greater than the parts, e.g. > Bokmål 'kjempemessig' might be better translated to 'ovstor' or 'diger' > in Nynorsk, 'bedømmelseskommité'→'domsnemnd' etc. > > > In summary: Dynamic compounding leads to more ambiguity and slower > analysis, and is thus used only when there is no lexicalised analysis. > Adding lexicalised compounds improves not only analysis of those > compounds and their contexts, but also improves dynamic compounding of > longer compounds. > > > > BTW I've found only one similar Danish word: "kæmpestor" (very large). I > > don't know if there are any more. > > If "kæmpe-" is not very productive in Danish, it might be better to > translate those words into something else (kjempelett→pærelet, > kjempegod→knippelgod?). Adding such pairs as lexicalised compounds in > the dictionaries will override dynamic compounding for those words. > > > > [1] Dynamic compounding is when the analyser only contains the parts and > guesses how they fit together, lexicalised compounds are defined as > those we spell out completely in the dictionary.
Tour de force. This should be on the Wiki! Fran ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
