Hey guys, Using wordbound blanks, we've modified the Apertium pipeline, modules and stream such that inline markup tags now move around with words in transfer, merge when LUs merge, split when LUs split, to preserve the formatting of the input document. If you want to follow the further development of this project, see here <https://wiki.apertium.org/wiki/User:Khannatanmai/Wordbound_blanks>.
We have a decent version that is ready to test that does markup handling for html documents. It will undergo extensive testing as part of this project, but I thought it'll be a good idea to let the community test it themselves on their language pairs based on their needs so that we can understand what features need to be added, and what needs to be fixed. Apertium users have been asking for markup handling for quite some time now and had no other option but to use wrappers that try to guess alignments. I'm hoping this project helps in that regard. Here's what you need to test this: - Make sure you have the latest commits of apertium and lttoolbox installed. - Latest commits of -separable, -anaphora, etc. if you're using those in your mode. - Clone and install https://github.com/TinoDidriksen/transfuse . After this all you need to do is pipe your html document to tf-html-fragment and give as argument a translation mode of your language pair of choice (full translation modes). Example: $ echo 'Hello <b>big green</b> <i>world</i>!' | tf-html-fragment /Users/khannatanmai/Documents/GSoC/repo/main/apertium-eng-spa/modes/eng-spa.mode Hola <i>Mundo</i> <b>verde grande</b> ! It only works for html right now, but we're in the process of supporting all usual document types. *Known issues:* - If a transfer rule has multiple words in the pattern, and in the output there is a LU that wasn't clipped from any word in the input, it won't put a wordbound blank on that LU. - If -separable detects a string of words then the format of each will be combined and added on the entire string of words. - apertium-recursive isn't supported as of now. It will be by the end of the project though. If you have any questions, suggestions, I'd be glad to respond to them on this thread. If you need help testing this on your language pair you can contact us on the IRC. Same if you find any bugs, or have any feature requests. Enjoy! *तन्मय खन्ना * *Tanmai Khanna*
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff