El 2018-06-06 22:10, Grzegorz Kulik escribió:
On 06.06.2018 02:54, Francis Tyers wrote:
El 2018-06-05 20:27, Grzegorz Kulik escribió:
On 05.06.2018 16:01, Francis Tyers wrote:
El 2018-06-05 14:23, Grzegorz Kulik escribió:
Hi, sorry for the late response. :)


On 01.06.2018 22:58, Francis Tyers wrote:
El 2018-06-01 20:59, Grzegorz Kulik escribió:
On 01.06.2018 16:55, Francis Tyers wrote:
El 2018-06-01 14:05, Grzegorz Kulik escribió:
On 01.06.2018 00:14, Francis Tyers wrote:
El 2018-05-31 23:36, Grzegorz Kulik escribió:
Okay, I've transferred apertium-szl and apertium-pol-szl to Apertium
on Github.

[..snip..]

Have you calculated the coverage for both dictionaries ?

Never thought about it, so I put together an ad hoc Polish corpus made from random Wikipedia articles and did the steps explained in the
Wiki. This is what i got:

79.265 % known tokens (543957 unknown, 0 bidix-unknown of total 2623413 tokens)

Did you compare that against the existing Apertium dictionary? (apertium-pol) ?

Not sure what you mean. I followed this:

http://wiki.apertium.org/wiki/Calculating_coverage

I used apertium-pol made by me.

Did you try the existing apertium-pol too ?

I didn't because it won't compile.


It compiles for me...

fran@matxine:~/source/apertium/languages/apertium-pol$ make
lt-comp lr apertium-pol.pol.dix pol.automorf.bin
main@standard 51576 111960
multiwords@standard 3292 3699
tokens@inconditional 23 139
lt-comp rl apertium-pol.pol.dix pol.autogen.bin
main@standard 51406 111414
multiwords@standard 3276 3675
tokens@inconditional 23 139
lt-comp lr apertium-pol.post-pol.dix pol.autopgen.bin
main@inconditional 42 77
lt-print pol.automorf.bin | gzip -9 -c -n > pol.automorf.att.gz
lt-print pol.autogen.bin | gzip -9 -c -n > pol.autogen.att.gz
/usr/bin/cg-comp apertium-pol.pol.rlx pol.rlx.bin
Sections: 2, Rules: 15, Sets: 48, Tags: 68
4 rules cannot be skipped by index.
apertium-validate-modes modes.xml
apertium-gen-modes modes.xml

Are you using master/ ?

That's... odd. I tried again with the files I pushed to Github and the
compiled with no problems. I must have done something wrong before.
Maybe I downloaded files from SVN?

Anyway, using the same corpus I got:

71.76 % known tokens (742530 unknown, 0 bidix-unknown of total 2629387 tokens)

Which is lower than mine.

Ok, then I think we have our answer, let's just use yours. Although I'd ask if you could switch "imperf" -> "impf" :D ... to be compatible with other Slavic languages.

Oh, didn't know about the difference. Sure, I'll change it. Have you
noticed anything else?


Nothing that I can tell yet.

F.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to