El 2019-03-07 09:22, Antonio Toral escribió:
Hi Jonathan, Fran,

Thanks for looking at this, I really appreciate it :)

From those two options, I think the first would be better.

For your purposes I agree.

If I got it right, the 1st is pure segmentation while the 2nd inserts an
additional д.

No, it's the other way around :)

In the first you lose a д through the process of поезд-де -> поез-де,
there are two underlying д but one is removed.

Segmenting поезде as поез>де (1st option) would allow us to recover
the original word easily from the segmented version. Segmenting as
поезд>де (2nd option) would not as we may recover the original word
wrongly as поездде.


This is correct, my intuition was that you wanted to keep the
segmented version as close to the surface form as possible.

We have a prototype (thanks Jonathan!), but it needs tweaking and
testing. Hopefully in the next couple of days...

Fran


_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to