Re: [Apertium-stuff] morph segmentation with Apertium

Francis Tyers Wed, 06 Mar 2019 08:33:35 -0800

El 2019-03-06 14:40, Antonio Toral escribió:

Dear apertiumers,


I would like to do morph segmentation for Kazakh and I've seen that
this is possible with Apertium [1].

However, in the example shown in that webpage the output doesn't seem
to be pure segmentation:

$ echo "щеткадағы" | hfst-proc kaz.segmenter
^щеткадағы/щетка>{D}{A}{G}{I}$

Is it possible to obtain segmentation instead? I.e.
щетка>дағы


Hi Antonio,

Thanks for your email! :D

You're right that it isn't pure segmentation. There is some good news
and some bad news.

The good news is that getting the 'pure' segmentation is definitelypossible

and without too much effort.

Essentially the problem is that the way the
phonological rules are defined, some of them depend on 0 (empty) symbols

on the surface side of the string. The morpheme boundary currentlyalways

goes to empty, so if we set it to not go to empty, then some of those
rules will break.

Fixing that means editting the rules to change the relevant contexts toask for

0 aside from the morpheme boundary on the surface. This shouldn't take
too long.

The bad news is that it isn't done yet, but given the fact that

it Kazakh is in WMT this year, it is definitely something we areplanning

to implement. Hopefully in the next couple of days.

Regards,

Fran




_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] morph segmentation with Apertium

Reply via email to