El 2019-11-29 14:25, Kevin Brubeck Unhammer escribió:
Hi,
I've just tagged new versions for release of the Scandinavian pairs,
they should be heading up to apertium.org and Github soon.
As before[1], the work comes courtesy of Nynorsk pressekontor / NPK and
the Norwegian News Agency / NTB, with funding from the Norwegian
Ministry of Culture; this fall they also hired Anja[2] to help out with
nno-nob. NPK has been using apertium-nno-nob successfully for over a
year now in order to create more Nynorsk news content.[3]
Some changes since last March in nno-nob:
- ~600 new names and more than 2000 new non-names added to bidix
- 270 new lrx rules (and we fixed an lrx-proc bug that would sometimes
let the wrong rule apply)
- 37 new transfer rules, including better handling of coordinations,
genitives and passives
- corpus-generated bigram-rules for choice of preposition when
rewriting
genitives to prepositional phrases
- compounding on digits
- many fixed expressions added
- many compound epenthetics fixed, partly automatically from corpus
analyses
- support for using headline markup in disambiguation (if
apertium-deshtml uses the -o switch)
- more consistent upper/lower-case handling (required a fix[4] to
cg-proc)
- lots more work on Bokmål disambiguation (which of course helps any
pair translating from nob), including some frequency-based fallback
rules generated from corpus. The rlx file is about 2500 lines
longer …
and split into two in order to do some sentence segmentation first.
The previous release we had median WER just below 7, now it is below 4
(median of 1898 WER tests on 1898 NTB news articles is 3.77 when
comparing post-edits to their inputs; stddev 4.73).
The other Scandinavian pairs and monolingual dependencises have gotten
maintenance releases. There aren't many changes there, though all have
some new words, and passives should behave a bit better in nor→dan.
Congrats! That's great news :D
Fran
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff