Hi all,

I've added a new mode for lt-proc -x, named (for now) inter-generation -
better name and flag are more than welcome :)

You can find the new feature in this branch (waiting for another pair of
eyes to make sure it works)

https://github.com/apertium/lttoolbox/pull/29/

It's meant to act on same type of FST than the postgen modules, with some
key differences:

   - It doesn't remove all ~, only the ones that matched
   - It processes fully the characters found on the left part
      - post-generation mode is meant to act on strings like ~word<b/>n,
      where word changes based on the "n" character coming after the word.
      Because of that, it moves processing back and n gets reprocessed.
      - inter-generation, on the other hand, consumes all characters
      matched on the input string. That gives some benefits, like being able to
      change a string at the end of the input, without the need of
having a blank
      + another word after that. It also adds some constraints, like limiting
      changes to words themselves.


Overall, this new mode allows the following scenario, with multiple
inter-generation steps (similar to the inter-chunk/post-chunk in the tagger)

... | lt-proc -g lang.autogen.bin | lt-proc -x lang.autointergen.1.bin |
lt-proc -x lang.autointergen.2.bin | ... | lt-proc -x
lang.autointergen.n.bin | lt-proc -p lang.autopgen

Let me know if you have any feedback/suggestion/question/...

Thanks!
-- 
< Xavi Ivars >
< http://xavi.ivars.me >
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to