Hi, I've successfully extracted a Swedish word list from apertium.sv-da.sv.dix as follows:
lt-expand apertium-sv-da.sv.dix | cut -f1 -d':' > apertium-sv-da.sv.dix.expanded Going through the list I found lots of errors. I excluded words present in the Aspell dictionary to get a shorter list of misspelled words. It was quite long though, and worse: it contained mostly correctly spelled words, unknown to Aspell. Hunspell (used by e.g. OpenOffice/Libre Office) knows much more words. Anyone that happens to know how to extract/get Hunspell word lists as text files? Looking at the misspelled list I realised that many of "the errors" are variants added for analysis only (r="LR"). Is there an easy way to expand only the variants that are used for generation? Such a procedure would produce a much shorter and more correct list. Anyhow, I continued by checking the list in Word-processing programs to get the real errors and found quite a lot. Some of them have I already corrected in the pair sv-da. What about the separate language dictionary? Should I merge my corrections somehow? What's the recommended procedure when improving/adding to an existing language pair? By the way: How do I use the separated language monodixies? Can they be used for existing pairs or only when creating new pairs? What's the recommendation for new pairs? The "Apertium New Language Pair HOWTO" still supposes that the monodixies are made exclusively for the new pair. Yours, Per Tunedal ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
