Jonathan Washington <jonathan.n.washing...@gmail.com> čálii: > Hi all, > > I have some students trying to trigger alternations across word boundaries > like the following: > inh japỹ / ijapỹ > inh tũ / isũ > > These alternations consistently triggered with certain common words that > end in "nh". > > They're using lexc/twol for the morphological generator. > > Our first approach was to put a literal ~ in "inh", i.e., the form was > "i~nh". This successfully triggered <a /> in the post-dix, though we got > slightly mangled output: > ij\/japỹ (or similar)
That slash seems like a bug. Could you post the exact input to "lt-proc -p" (output of your generator) and the post.dix? > Also, this isn't quite an ideal approach. I suppose we could fairly easily > automate the insertion of ~ before every nh in the lttoolbox (bin) version > of the HFST transducer. But it still seems to be somewhat buggy. > > Are there any other solutions that people have gotten to work? IIUC, those kinds of word boundary-crossing changes are exactly what the postgenerator is supposed to handle, though it is annoying to have to insert the mark. I've been manually inserting the <a/> on double consonants at the ends of words that can compound (to avoid getting triple consonants if the next word starts with the same one), but manual is error prone, and it's noisy in the .dix file. Is there any reason postgen couldn't just run on *everything* LRLM and only apply the changes where it matches (as if it were a version of sed that respects deformatting)? Then you could just do <l>inh<b/>t<l> <r>is</r> in post.dix and have no changes to the hfst.
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff