Hey all, In preparing my test corpus for experiments with lexical selection, I've done a ~30,000 word evaluation of Catalan->English, and have come up with the following list of observations:
https://apertium.svn.sourceforge.net/svnroot/apertium/branches/apertium-en-ca/dev/observations.ca-en.txt They include multiwords, transfer rules, missing morphology, and lexical rules. But mainly multiwords. I don't have time to make the changes, but maybe someone on the list is interested. The current error rate is around 45% according to my calculations -- but the texts haven't been properly checked yet. Fran PS. I have started to write a page for discussion of the lexical selection module here: http://wiki.apertium.org/wiki/Constraint-based_lexical_selection_module I would appreciate input on the talk page. ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2dcopy1 _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
