On 29.05.2018 12:38, Francis Tyers wrote:
El 2018-05-29 11:12, Grzegorz Kulik escribió:
Hi,

I've been developing the Polish - Silesian Apertium pair for some time
and the translations have become reasonable so I reckon it's time to
publish them. From the 10 000 most frequent Polish words it covers
nearly 9 000 (the rest is on its way) and it handles more than 21
thousand words altogether.

https://github.com/gkkulik/apertium-pol

https://github.com/gkkulik/apertium-pol-szl

https://github.com/gkkulik/apertium-szl

It still needs some fine tuning as sometimes it gives slightly amusing
output. I improve it regularly because I use it daily to translate
news so I get rid of any spotted mistakes. I hope you people can also
give me some tips since you obviously know much more about the
technical aspects of Apertium.

Wow great, where is the news published ?

We have a local website that pays more attention to local culture and history: https://wachtyrz.eu

The name means "Guardian" because I thought we should aim high. ;)


Questions:

I want to improve the translation by developing handtagged coprora for
both languages. What size do I need to make it reasonable?

Well, starting from 10,000 tokens or so. You might be able to convert
some sentences from an existing corpus (e.g. UD_Polish), but it might
be better to tag from scratch.  I would start with trying the unigram
tagger (it's much easier to train) and if not try the perceptron tagger.
Great! Thank you, I'll look into it.

There was a great PDF Apertium developer manual but I cannot find it
anywhere. Can anybody point me in the right direction?

Is this the one you are referring to?

http://xixona.dlsi.ua.es/~fran/apertium2-documentation.pdf
Yes, that's the thing. Thank you very much!

When the pair is published in trunk and on the website, I want to make
it a media event here in Upper Silesia which means increased traffic.
Is that okay? Sorry if this question is silly. :)


Yes that would be great!

Would you be interested in moving the code to the Apertium project to
be able to take advantage of including it in the website and APy?

Tino: Do you know what the process would be for that?

Fran
Yeah, I want it to be included in the Apertium project. I was already told by people at the Silesian University in Katowice that they'd be interested in helping developing a Silesian - Czech pair which would be awesome. I was also told there is interest in developing a Serbo-Croatian - Silesian pair and a Ukrainian - Silesian one so you might expect some considerable input from my area.

Oh, and just one more question about stuff I'm going to need for the press release: what projects use Apertium? Wikipedia uses it where possible if I remember correctly. Is there a list of those?

Greg


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to