Hey Xavi, You're right, post generation merging is an issue right now and I forgot to mention it in a list of known issues. I'm working on it right now so that when words merge in postgen then the wordbound blanks are combined and added to the output.
tf-close however will remain before the generation step. Since after generation the tokenisation of the input is lost (as the LUs are removed), tf-close adds a delimiter so that the reformatter knows the span of the format. Hope that answers your question. Thanks and Regards, *तन्मय खन्ना * *Tanmai Khanna* On Tue, Jul 21, 2020 at 4:57 AM Xavi Ivars <xavi.iv...@gmail.com> wrote: > Hello! I just found a (potential) issue, and wanted to double check with > you (it's probably something you already looked into and is not a real > issue): looking at transfuse's code, I saw tf-mangle-mode is doing tf-close > on the generation step. > > How does it work when postgen steps merges some of words? > > > -- > Xavi Ivars > < http://xavi.ivars.me > > > El dl., 20 de jul. 2020, 20:34, Tanmai Khanna <khanna.tan...@gmail.com> > va escriure: > >> Hey guys, >> Using wordbound blanks, we've modified the Apertium pipeline, modules and >> stream such that inline markup tags now move around with words in transfer, >> merge when LUs merge, split when LUs split, to preserve the formatting of >> the input document. If you want to follow the further development of this >> project, see here >> <https://wiki.apertium.org/wiki/User:Khannatanmai/Wordbound_blanks>. >> >> We have a decent version that is ready to test that does markup handling >> for html documents. It will undergo extensive testing as part of this >> project, but I thought it'll be a good idea to let the community test it >> themselves on their language pairs based on their needs so that we can >> understand what features need to be added, and what needs to be fixed. >> Apertium users have been asking for markup handling for quite some time now >> and had no other option but to use wrappers that try to guess alignments. >> I'm hoping this project helps in that regard. Here's what you need to test >> this: >> - Make sure you have the latest commits of apertium and lttoolbox >> installed. >> - Latest commits of -separable, -anaphora, etc. if you're using those in >> your mode. >> - Clone and install https://github.com/TinoDidriksen/transfuse . >> >> After this all you need to do is pipe your html document to >> tf-html-fragment and give as argument a translation mode of your language >> pair of choice (full translation modes). >> >> Example: >> >> $ echo 'Hello <b>big green</b> <i>world</i>!' | tf-html-fragment >> /Users/khannatanmai/Documents/GSoC/repo/main/apertium-eng-spa/modes/eng-spa.mode >> >> >> Hola <i>Mundo</i> <b>verde grande</b> ! >> >> >> It only works for html right now, but we're in the process of supporting >> all usual document types. >> >> >> *Known issues:* >> >> - If a transfer rule has multiple words in the pattern, and in the output >> there is a LU that wasn't clipped from any word in the input, it won't put >> a wordbound blank on that LU. >> >> - If -separable detects a string of words then the format of each will be >> combined and added on the entire string of words. >> >> - apertium-recursive isn't supported as of now. It will be by the end of >> the project though. >> >> >> If you have any questions, suggestions, I'd be glad to respond to them on >> this thread. If you need help testing this on your language pair you can >> contact us on the IRC. Same if you find any bugs, or have any feature >> requests. >> >> >> Enjoy! >> *तन्मय खन्ना * >> *Tanmai Khanna* >> _______________________________________________ >> Apertium-stuff mailing list >> Apertium-stuff@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/apertium-stuff >> > _______________________________________________ > Apertium-stuff mailing list > Apertium-stuff@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/apertium-stuff >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff