Hi, I've just tagged new versions for release of the Scandinavian pairs, they should be heading up to apertium.org and Github soon.
As before[1], the work comes courtesy of Nynorsk pressekontor / NPK and the Norwegian News Agency / NTB, with funding from the Norwegian Ministry of Culture; this fall they also hired Anja[2] to help out with nno-nob. NPK has been using apertium-nno-nob successfully for over a year now in order to create more Nynorsk news content.[3] Some changes since last March in nno-nob: - ~600 new names and more than 2000 new non-names added to bidix - 270 new lrx rules (and we fixed an lrx-proc bug that would sometimes let the wrong rule apply) - 37 new transfer rules, including better handling of coordinations, genitives and passives - corpus-generated bigram-rules for choice of preposition when rewriting genitives to prepositional phrases - compounding on digits - many fixed expressions added - many compound epenthetics fixed, partly automatically from corpus analyses - support for using headline markup in disambiguation (if apertium-deshtml uses the -o switch) - more consistent upper/lower-case handling (required a fix[4] to cg-proc) - lots more work on Bokmål disambiguation (which of course helps any pair translating from nob), including some frequency-based fallback rules generated from corpus. The rlx file is about 2500 lines longer … and split into two in order to do some sentence segmentation first. The previous release we had median WER just below 7, now it is below 4 (median of 1898 WER tests on 1898 NTB news articles is 3.77 when comparing post-edits to their inputs; stddev 4.73). The other Scandinavian pairs and monolingual dependencises have gotten maintenance releases. There aren't many changes there, though all have some new words, and passives should behave a bit better in nor→dan. -Kevin [1] https://sourceforge.net/p/apertium/mailman/message/36609798/ [2] https://github.com/anjazp [3] https://journalisten.no/karoline-riise-kristiansen-martin-eide-npk/jeg-opplever-at-det-er-gode-vilkar-for-nynorsk-om-dagen/382345 [4] https://github.com/TinoDidriksen/cg3/commit/492ecebff80d2bbc68742d01e9cba1c1891d2121 _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff