Cool!

Mikel

P.S. Actually, why not щетка- -да- -ғы (щетка+locative+adjectivizer)?

El 6/3/19 a les 17:32, Francis Tyers ha escrit:
El 2019-03-06 14:40, Antonio Toral escribió:
Dear apertiumers,

I would like to do morph segmentation for Kazakh and I've seen that
this is possible with Apertium [1].

However, in the example shown in that webpage the output doesn't seem
to be pure segmentation:

$ echo "щеткадағы" | hfst-proc kaz.segmenter
^щеткадағы/щетка>{D}{A}{G}{I}$

Is it possible to obtain segmentation instead? I.e.
c

Hi Antonio,

Thanks for your email! :D

You're right that it isn't pure segmentation. There is some good news
and some bad news.

The good news is that getting the 'pure' segmentation is definitely possible
and without too much effort.

Essentially the problem is that the way the
phonological rules are defined, some of them depend on 0 (empty) symbols
on the surface side of the string. The morpheme boundary currently always
goes to empty, so if we set it to not go to empty, then some of those
rules will break.

Fixing that means editting the rules to change the relevant contexts to ask for
0 aside from the morpheme boundary on the surface. This shouldn't take
too long.

The bad news is that it isn't done yet, but given the fact that
it Kazakh is in WMT this year, it is definitely something we are planning
to implement. Hopefully in the next couple of days.

Regards,

Fran




_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

--
Mikel L. Forcada  http://www.dlsi.ua.es/~mlf/
Departament de Llenguatges i Sistemes Informàtics
Universitat d'Alacant
E-03690 Sant Vicent del Raspeig
Spain
Office: +34 96 590 9776



_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to