HI there. A quick quirk. I may go through the rest of the message later.
Al 11/15/2011 12:16 AM, En/na Kevin Donnelly ha escrit: > Re Fran's trivial stemming being OK for a tagger, but not for an MT system, > ----------------------------------------------------------------------------------------------------------- > this is indeed a valid point, so the suggestion may not be viable as far as MT > goes. > > However, it is not entirely impractical. I can envisage something like the > following, which assumes that the monodixes will have entries for surface and > lemma. Taking the relatively rare word "conductress", the process might be as > follows: > 1. "conductress" is not in the surface column of the English monodix. > 2. so, change -ress to -or+f (using a set of regex lookups appropriate to the > language) > 3. is "conductor" in the surface column of the English monodix? > 4. yes, so find its equivalent noun in the other language in the bidix. > 5. find that equivalent in the other language's monodix > 6. is this equivalent marked f in the gender column? > 7. no, so see if there are other noun items with the same lemma > 8. are any of them marked f? > 9. if so, choose that. > 10. if not, use the original find > The lemma might hold the masculine singular form of nouns and adjectives, or > the infinitive of verbs (or in the case of Swahili loan-words from Arabic, the > Arabic 3-letter stem) - this is one of the things that might be decided per > language or language-group. > > In theory this should work, and the main benefit would be to enable guesses to > be made about the meaning even if the word is not in the dictionary. For > instance, the diminutive -ito/a/os/as in Spanish seems to be frequently used > in Latin American Spanish, and since it is both regular and productive, it is > nugatory to enter words with it into the dictionary (since in effect the > number > of words it could be used with is extremely large). Using the above process > would generate an English equivalent even it it were not in the dictionary, > and if it were considered desirable to carry across the diminutive meaning > (which in most cases is not really necessary), you could have another set of > lookups as a post-processor on the other side. In English, perhaps something > like "[small]" could be added for nouns, "[rather]" for adjectives, eg > tiempito - [small] time, bajitos - [rather] low. > > I accept, though, that this might affect the speed of the translation, which > may not be desirable, and that you may get some false positives. This is basically how ispell and other spell checkers work (though, granted, only for suffix morphology) and in fact there was a Portuguese group that built something called jspell that did output morphological information as part of spell checking. I think their GPL Portuguese dictionary was used for Portuguese dictionaries in Apertium but I am not sure. Two problems come to mind: (1) many of Kevin's transformation rules can match an entry; this may be computationally more intensive (compare to finite-state-transducers as used in Apertium). However, maybe they don't have to be run at runtime: they could be massively applied at compile time to generate a .dix. I would have to think harder about this. (2) you may get lexical forms which have no match in the bilingual dictionary! The only way to avoid this would be to mark some of the surface forms as lemmas and make sure there are none which is not in the dictionary. An advantage: the PoS tagger gets better information than just "unknown word" On other matters, I had to search for "tupp'orth" (="twopence worth", apparently, in the sense of "my [humble] opinion"). First-language English speakers always have an advantage over us second-language speakers... :-( Mikel -- Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/) Departament de Llenguatges i Sistemes Informàtics Universitat d'Alacant E-03071 Alacant, Spain Phone: +34 96 590 9776 Fax: +34 96 590 9326 ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
