El 2018-06-06 22:10, Grzegorz Kulik escribió:
On 06.06.2018 02:54, Francis Tyers wrote:
El 2018-06-05 20:27, Grzegorz Kulik escribió:
On 05.06.2018 16:01, Francis Tyers wrote:
El 2018-06-05 14:23, Grzegorz Kulik escribió:
Hi, sorry for the late response. :)
On 01.06.2018 22:58, Francis Tyers wrote:
El 2018-06-01 20:59, Grzegorz Kulik escribió:
On 01.06.2018 16:55, Francis Tyers wrote:
El 2018-06-01 14:05, Grzegorz Kulik escribió:
On 01.06.2018 00:14, Francis Tyers wrote:
El 2018-05-31 23:36, Grzegorz Kulik escribió:
Okay, I've transferred apertium-szl and apertium-pol-szl to
Apertium
on Github.
[..snip..]
Have you calculated the coverage for both dictionaries ?
Never thought about it, so I put together an ad hoc Polish
corpus made
from random Wikipedia articles and did the steps explained in
the
Wiki. This is what i got:
79.265 % known tokens (543957 unknown, 0 bidix-unknown of total
2623413 tokens)
Did you compare that against the existing Apertium dictionary?
(apertium-pol) ?
Not sure what you mean. I followed this:
http://wiki.apertium.org/wiki/Calculating_coverage
I used apertium-pol made by me.
Did you try the existing apertium-pol too ?
I didn't because it won't compile.
It compiles for me...
fran@matxine:~/source/apertium/languages/apertium-pol$ make
lt-comp lr apertium-pol.pol.dix pol.automorf.bin
main@standard 51576 111960
multiwords@standard 3292 3699
tokens@inconditional 23 139
lt-comp rl apertium-pol.pol.dix pol.autogen.bin
main@standard 51406 111414
multiwords@standard 3276 3675
tokens@inconditional 23 139
lt-comp lr apertium-pol.post-pol.dix pol.autopgen.bin
main@inconditional 42 77
lt-print pol.automorf.bin | gzip -9 -c -n > pol.automorf.att.gz
lt-print pol.autogen.bin | gzip -9 -c -n > pol.autogen.att.gz
/usr/bin/cg-comp apertium-pol.pol.rlx pol.rlx.bin
Sections: 2, Rules: 15, Sets: 48, Tags: 68
4 rules cannot be skipped by index.
apertium-validate-modes modes.xml
apertium-gen-modes modes.xml
Are you using master/ ?
That's... odd. I tried again with the files I pushed to Github and
the
compiled with no problems. I must have done something wrong before.
Maybe I downloaded files from SVN?
Anyway, using the same corpus I got:
71.76 % known tokens (742530 unknown, 0 bidix-unknown of total
2629387 tokens)
Which is lower than mine.
Ok, then I think we have our answer, let's just use yours. Although
I'd ask if you
could switch "imperf" -> "impf" :D ... to be compatible with other
Slavic languages.
Oh, didn't know about the difference. Sure, I'll change it. Have you
noticed anything else?
Nothing that I can tell yet.
F.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff