El 2018-05-31 23:36, Grzegorz Kulik escribió:
Okay, I've transferred apertium-szl and apertium-pol-szl to Apertium
on Github.
Great, I've made a couple of changes to the apertium-szl
I want to sort the Polish dictionary too but I see tens of
errors in the one on your side. Has anybody ever tried to compile it?
There are dozens of duplicated key sequences, and all those thousands
of machine-generated words that end with -ość are generated
improperly:
<e lm="realizacyjność"><i>realizacyjność</i><par
n="miłoś/ć__n"/></e>
while it should be:
<e lm="realizacyjność"><i>realizacyjnoś</i><par
n="miłoś/ć__n"/></e> - no ć between the <i> tags
But not only that. All the words after the line 24435 (the -ość ones),
they don't exist. And they make up around 2/3 of the dictionary. I
understand it was easy for someone to machine generate them by adding
prefixes and suffixes to actual words but let me translate some of the
first twenty ones for you:
nieobżartość - un-fed-up-ness
niepozauranowość - un-outside-uranium-ness
niebiałoramienność - un-white-arm-ness
niejeżowość - un-hedgehog-ness
nieprzysłoneczność - un-at-sun-ness
niewołowatość - un-mule-ness
If you delete those, you're left with mere 15792 entries. Wouldn't it
be better to just use the dictionary I made manually? It's proven to
work, it's twice as large, and it was built on the Polish monodix
found on the SVN in January 2016, i just got rid of errors and added
entries.
What do you think?
I think that sounds fine to me. There is pol-ces in the staging/ part
but as far as I know there has been no released pair with Polish yet.
Jim, what do you think ?
In general, if you're willing to maintain it, I'd say that given
there are no other released pairs yet, you should get priority
to decide what content it has.
Have you calculated the coverage for both dictionaries ?
Another option would be to keep the old apertium-pol in a branch,
and copy yours in as master.
Fran
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff