Hello all
This is to announce that Apertium French->Esperanto is ready to be
published.*
Coverage*
- Wikipedia: 93.4%
- L'Est Républicain: 94,3% ( http://www.cnrtl.fr/corpus/estrepublicain/ )
*
Dictionary stats*
- French dictionary: 44,633
- French-Esperanto dictionary: 43,016
*Rule stats*
The translator uses quite extensively n-stage transfer (
http://wiki.apertium.org/wiki/N-Stage_transfer ). Interchunk has been
divided into six steps.
- t1x: 352
- interchunk
- t2x1: 86
- t2x2: 33
- t2x3: 11
- t2x4: 6
- t2x5: 40
- t2x6: 2
- t3x: 1
*
Evaluation*
The translator has been evaluated for two sets of 100 random selected
sentences from the Wikipedia and L'Est Républicain corpora.
Wikipedia corpus:
Statistics about input files
-------------------------------------------------------
Number of words in reference: 2374
Number of words in test: 2464
Number of unknown words (marked with a star) in test: 203
Percentage of unknown words: 8,24 %
Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 532
Word error rate (WER): 22,41 %
Number of position-independent correct words: 1976
Position-independent word error rate (PER): 20,56 %
L'Est Republicain corpus:
Statistics about input files
-------------------------------------------------------
Number of words in reference: 1755
Number of words in test: 1809
Number of unknown words (marked with a star) in test: 143
Percentage of unknown words: 7,90 %
Results when removing unknown-word marks (stars)
-------------------------------------------------------
Edit distance: 417
Word error rate (WER): 23,76 %
Number of position-independent correct words: 1436
Position-independent word error rate (PER): 21,25 %
I'll by glad if the translator could be published in the Apertium web site
or in an alternative project site, as the Apertium site is currently under
reconstruction. Please notice that Apertium version 3.2 or higher is needed.
Hèctor
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff