Re: [Apertium-stuff] Polish - Silesian pair

Francis Tyers Wed, 06 Jun 2018 14:13:15 -0700

El 2018-06-06 22:10, Grzegorz Kulik escribió:

On 06.06.2018 02:54, Francis Tyers wrote:

El 2018-06-05 20:27, Grzegorz Kulik escribió:

On 05.06.2018 16:01, Francis Tyers wrote:

El 2018-06-05 14:23, Grzegorz Kulik escribió:

Hi, sorry for the late response. :)
On 01.06.2018 22:58, Francis Tyers wrote:
El 2018-06-01 20:59, Grzegorz Kulik escribió:
On 01.06.2018 16:55, Francis Tyers wrote:
El 2018-06-01 14:05, Grzegorz Kulik escribió:
On 01.06.2018 00:14, Francis Tyers wrote:
El 2018-05-31 23:36, Grzegorz Kulik escribió:
Okay, I've transferred apertium-szl and apertium-pol-szl toApertium
on Github.
[..snip..]
Have you calculated the coverage for both dictionaries ?
Never thought about it, so I put together an ad hoc Polishcorpus madefrom random Wikipedia articles and did the steps explained inthe
Wiki. This is what i got:
79.265 % known tokens (543957 unknown, 0 bidix-unknown of total2623413 tokens)
Did you compare that against the existing Apertium dictionary?(apertium-pol) ?
Not sure what you mean. I followed this:

http://wiki.apertium.org/wiki/Calculating_coverage

I used apertium-pol made by me.
Did you try the existing apertium-pol too ?
I didn't because it won't compile.


It compiles for me...

fran@matxine:~/source/apertium/languages/apertium-pol$ make
lt-comp lr apertium-pol.pol.dix pol.automorf.bin
main@standard 51576 111960
multiwords@standard 3292 3699
tokens@inconditional 23 139
lt-comp rl apertium-pol.pol.dix pol.autogen.bin
main@standard 51406 111414
multiwords@standard 3276 3675
tokens@inconditional 23 139
lt-comp lr apertium-pol.post-pol.dix pol.autopgen.bin
main@inconditional 42 77
lt-print pol.automorf.bin | gzip -9 -c -n > pol.automorf.att.gz
lt-print pol.autogen.bin | gzip -9 -c -n > pol.autogen.att.gz
/usr/bin/cg-comp apertium-pol.pol.rlx pol.rlx.bin
Sections: 2, Rules: 15, Sets: 48, Tags: 68
4 rules cannot be skipped by index.
apertium-validate-modes modes.xml
apertium-gen-modes modes.xml

Are you using master/ ?

That's... odd. I tried again with the files I pushed to Github andthe

compiled with no problems. I must have done something wrong before.
Maybe I downloaded files from SVN?

Anyway, using the same corpus I got:

71.76 % known tokens (742530 unknown, 0 bidix-unknown of total2629387 tokens)


Which is lower than mine.

Ok, then I think we have our answer, let's just use yours. AlthoughI'd ask if youcould switch "imperf" -> "impf" :D ... to be compatible with otherSlavic languages.


Oh, didn't know about the difference. Sure, I'll change it. Have you
noticed anything else?


Nothing that I can tell yet.

F.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Polish - Silesian pair

Reply via email to