Jonathan Washington
<jonathan.n.washing...@gmail.com> čálii:

> Hi all,
>
> I have some students trying to trigger alternations across word boundaries
> like the following:
> inh japỹ / ijapỹ
> inh tũ / isũ
>
> These alternations consistently triggered with certain common words that
> end in "nh".
>
> They're using lexc/twol for the morphological generator.
>
> Our first approach was to put a literal ~ in "inh", i.e., the form was
> "i~nh".  This successfully triggered <a /> in the post-dix, though we got
> slightly mangled output:
> ij\/japỹ (or similar)

That slash seems like a bug. Could you post the exact input to
"lt-proc -p" (output of your generator) and the post.dix?

> Also, this isn't quite an ideal approach.  I suppose we could fairly easily
> automate the insertion of ~ before every nh in the lttoolbox (bin) version
> of the HFST transducer.  But it still seems to be somewhat buggy.
>
> Are there any other solutions that people have gotten to work?

IIUC, those kinds of word boundary-crossing changes are exactly what the
postgenerator is supposed to handle, though it is annoying to have to
insert the mark. I've been manually inserting the <a/> on double
consonants at the ends of words that can compound (to avoid getting
triple consonants if the next word starts with the same one), but manual
is error prone, and it's noisy in the .dix file.

Is there any reason postgen couldn't just run on *everything* LRLM and
only apply the changes where it matches (as if it were a version of sed
that respects deformatting)? Then you could just do
<l>inh<b/>t<l> <r>is</r>
in post.dix and have no changes to the hfst.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to