El dj 20 de 09 de 2012 a les 09:07 +0200, en/na Per Tunedal va escriure:
> Hi,
> It would be interesting to know more about how to auto-trim a
> monolingual dictionary to the words present in the bidix. It would be
> highly appropriate for my work on Norwegian bokmål (nb) to Swedish (se).
> And, of course, for "correcting" the pair Danish (da) to Swedish (se).
> I've thought on commenting out offending entries with some clever script
> and/or keeping a full dix in parallel. I don't want to loose the full
> dictionaries, as I hope the bidix gradually would be increased.

See for example in the apertium-af-nl[1] (Afrikaans and Dutch) pair. The
full Dutch dictionary and the script(s) for trimming are in nl/

1.
https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-af-nl

I can't guarantee that it will work without modification. But it should
give a reasonable idea about how to go about it.

> BTW I'm impressed by your work. To take on such a complicated task and
> manage to accomplish it. Apparently, Apertium is very useful for
> understanding small languages. Statistical approaches would probably be
> out of the question.

Well, there is a medium-sized parallel corpus for Sámi -- but I don't
think any SMT system could hope to have anywhere near the same kind of
coverage just working with surface forms. And thanks for the compliment
too! 

Fran


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to