Hi,

I've just tagged new versions for release of the Scandinavian pairs,
they should be heading up to apertium.org and Github soon.

As before[1], the work comes courtesy of Nynorsk pressekontor / NPK and
the Norwegian News Agency / NTB, with funding from the Norwegian
Ministry of Culture; this fall they also hired Anja[2] to help out with
nno-nob. NPK has been using apertium-nno-nob successfully for over a
year now in order to create more Nynorsk news content.[3]

Some changes since last March in nno-nob:
- ~600 new names and more than 2000 new non-names added to bidix
- 270 new lrx rules (and we fixed an lrx-proc bug that would sometimes
  let the wrong rule apply)
- 37 new transfer rules, including better handling of coordinations,
  genitives and passives
- corpus-generated bigram-rules for choice of preposition when rewriting
  genitives to prepositional phrases
- compounding on digits
- many fixed expressions added
- many compound epenthetics fixed, partly automatically from corpus
  analyses
- support for using headline markup in disambiguation (if
  apertium-deshtml uses the -o switch)
- more consistent upper/lower-case handling (required a fix[4] to
  cg-proc) 
- lots more work on Bokmål disambiguation (which of course helps any
  pair translating from nob), including some frequency-based fallback
  rules generated from corpus. The rlx file is about 2500 lines longer …
  and split into two in order to do some sentence segmentation first.

The previous release we had median WER just below 7, now it is below 4
(median of 1898 WER tests on 1898 NTB news articles is 3.77 when
comparing post-edits to their inputs; stddev 4.73).

The other Scandinavian pairs and monolingual dependencises have gotten
maintenance releases. There aren't many changes there, though all have
some new words, and passives should behave a bit better in nor→dan.


-Kevin

[1] https://sourceforge.net/p/apertium/mailman/message/36609798/
[2] https://github.com/anjazp
[3] 
https://journalisten.no/karoline-riise-kristiansen-martin-eide-npk/jeg-opplever-at-det-er-gode-vilkar-for-nynorsk-om-dagen/382345
[4] 
https://github.com/TinoDidriksen/cg3/commit/492ecebff80d2bbc68742d01e9cba1c1891d2121



_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to