El dj 20 de 09 de 2012 a les 09:07 +0200, en/na Per Tunedal va escriure: > Hi, > It would be interesting to know more about how to auto-trim a > monolingual dictionary to the words present in the bidix. It would be > highly appropriate for my work on Norwegian bokmål (nb) to Swedish (se). > And, of course, for "correcting" the pair Danish (da) to Swedish (se). > I've thought on commenting out offending entries with some clever script > and/or keeping a full dix in parallel. I don't want to loose the full > dictionaries, as I hope the bidix gradually would be increased.
See for example in the apertium-af-nl[1] (Afrikaans and Dutch) pair. The full Dutch dictionary and the script(s) for trimming are in nl/ 1. https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-af-nl I can't guarantee that it will work without modification. But it should give a reasonable idea about how to go about it. > BTW I'm impressed by your work. To take on such a complicated task and > manage to accomplish it. Apparently, Apertium is very useful for > understanding small languages. Statistical approaches would probably be > out of the question. Well, there is a medium-sized parallel corpus for Sámi -- but I don't think any SMT system could hope to have anywhere near the same kind of coverage just working with surface forms. And thanks for the compliment too! Fran ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
