Okay, I've transferred apertium-szl and apertium-pol-szl to Apertium on Github. I want to sort the Polish dictionary too but I see tens of errors in the one on your side. Has anybody ever tried to compile it? There are dozens of duplicated key sequences, and all those thousands of machine-generated words that end with -ość are generated improperly:

    <e lm="realizacyjność"><i>realizacyjność</i><par n="miłoś/ć__n"/></e>

while it should be:

    <e lm="realizacyjność"><i>realizacyjnoś</i><par n="miłoś/ć__n"/></e> - no ć between the <i> tags

But not only that. All the words after the line 24435 (the -ość ones), they don't exist. And they make up around 2/3 of the dictionary. I understand it was easy for someone to machine generate them by adding prefixes and suffixes to actual words but let me translate some of the first twenty ones for you:

nieobżartość - un-fed-up-ness
niepozauranowość - un-outside-uranium-ness
niebiałoramienność - un-white-arm-ness
niejeżowość - un-hedgehog-ness
nieprzysłoneczność - un-at-sun-ness
niewołowatość - un-mule-ness

If you delete those, you're left with mere 15792 entries. Wouldn't it be better to just use the dictionary I made manually? It's proven to work, it's twice as large, and it was built on the Polish monodix found on the SVN in January 2016, i just got rid of errors and added entries.

What do you think?

Greg


On 31.05.2018 11:45, Francis Tyers wrote:
El 2018-05-29 20:07, Grzegorz Kulik escribió:
On 29.05.2018 12:38, Francis Tyers wrote:
El 2018-05-29 11:12, Grzegorz Kulik escribió:
Hi,

I've been developing the Polish - Silesian Apertium pair for some time
and the translations have become reasonable so I reckon it's time to
publish them. From the 10 000 most frequent Polish words it covers
nearly 9 000 (the rest is on its way) and it handles more than 21
thousand words altogether.

https://github.com/gkkulik/apertium-pol

https://github.com/gkkulik/apertium-pol-szl

https://github.com/gkkulik/apertium-szl

It still needs some fine tuning as sometimes it gives slightly amusing
output. I improve it regularly because I use it daily to translate
news so I get rid of any spotted mistakes. I hope you people can also
give me some tips since you obviously know much more about the
technical aspects of Apertium.

Wow great, where is the news published ?

We have a local website that pays more attention to local culture and
history: https://wachtyrz.eu

The name means "Guardian" because I thought we should aim high. ;)

Cool, perhaps the corpus could be released at some point. I'm sure
there would be a lot of academic interest in a Polish--Silesian parallel corpus. :)

[..snip..]

When the pair is published in trunk and on the website, I want to make
it a media event here in Upper Silesia which means increased traffic.
Is that okay? Sorry if this question is silly. :)


Yes that would be great!

Would you be interested in moving the code to the Apertium project to
be able to take advantage of including it in the website and APy?

Tino: Do you know what the process would be for that?

Fran
Yeah, I want it to be included in the Apertium project.

Ok, I've invited you to the organisation. You should first accept
the invitation. You should look at:

https://help.github.com/articles/transferring-a-repository-owned-by-your-personal-account/

For apertium-pol-szl and apertium-szl there shouldn't be any problem.

For apertium-pol, we should do something different as there is an existing
module. Probably the right thing to do is:

* fork it
* apply your local changes
* send a pull request
* delete the fork

If you're not comfortable with doing this, then perhaps someone in Apertium could
do it for you.

I was already
told by people at the Silesian University in Katowice that they'd be
interested in helping developing a Silesian - Czech pair which would
be awesome. I was also told there is interest in developing a
Serbo-Croatian - Silesian pair and a Ukrainian - Silesian one so you
might expect some considerable input from my area.

That's great news!

Oh, and just one more question about stuff I'm going to need for the
press release: what projects use Apertium? Wikipedia uses it where
possible if I remember correctly. Is there a list of those?

We have this page on the Wiki:

http://wiki.apertium.org/wiki/Press

But it isn't kept completely up to date.

Regards,

Fran


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to