Hi all,
I've added a new mode for lt-proc -x, named (for now) inter-generation -
better name and flag are more than welcome :)
You can find the new feature in this branch (waiting for another pair of
eyes to make sure it works)
https://github.com/apertium/lttoolbox/pull/29/
It's meant to act on same type of FST than the postgen modules, with some
key differences:
- It doesn't remove all ~, only the ones that matched
- It processes fully the characters found on the left part
- post-generation mode is meant to act on strings like ~word<b/>n,
where word changes based on the "n" character coming after the word.
Because of that, it moves processing back and n gets reprocessed.
- inter-generation, on the other hand, consumes all characters
matched on the input string. That gives some benefits, like being able to
change a string at the end of the input, without the need of
having a blank
+ another word after that. It also adds some constraints, like limiting
changes to words themselves.
Overall, this new mode allows the following scenario, with multiple
inter-generation steps (similar to the inter-chunk/post-chunk in the tagger)
... | lt-proc -g lang.autogen.bin | lt-proc -x lang.autointergen.1.bin |
lt-proc -x lang.autointergen.2.bin | ... | lt-proc -x
lang.autointergen.n.bin | lt-proc -p lang.autopgen
Let me know if you have any feedback/suggestion/question/...
Thanks!
--
< Xavi Ivars >
< http://xavi.ivars.me >
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff