On 12 November 2011 21:15, Kevin Donnelly <[email protected]> wrote: [SNIP]
> In effect, you are splitting words artificially (not along linguistically- > accepted lines) on input, so that you can put them back together again at > lookup. It would be simpler just to enter and look up a full-form word. Given your complaints, elsewhere in the email, about Spanish enclitic pronouns, I have to wonder to what, specifically, are you referring here? As you mention full-form words, perhaps you're not aware that paradigms are not obligatory? We could just as easily stick full-form lists in XML, and they will compile just as well as entries with paradigms. What's more, the compiled binary representations of both will be identical. So if your concern is that where there is an entry that consists of, say, the string "deput" + the paradigm "bab/y__n", that the runtime first looks up "deput", then looks up some abstract representation of the paradigm... let me assure you that this is not the case. If, on the other hand, you're referring to how we segment something like dímelo into decir+me+lo... saying that it's "not along linguistically-accepted lines" may be a neat rhetorical device, but it's not true. -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
