Hey guys,
Using wordbound blanks, we've modified the Apertium pipeline, modules and
stream such that inline markup tags now move around with words in transfer,
merge when LUs merge, split when LUs split, to preserve the formatting of
the input document. If you want to follow the further development of this
project, see here
<https://wiki.apertium.org/wiki/User:Khannatanmai/Wordbound_blanks>.

We have a decent version that is ready to test that does markup handling
for html documents. It will undergo extensive testing as part of this
project, but I thought it'll be a good idea to let the community test it
themselves on their language pairs based on their needs so that we can
understand what features need to be added, and what needs to be fixed.
Apertium users have been asking for markup handling for quite some time now
and had no other option but to use wrappers that try to guess alignments.
I'm hoping this project helps in that regard. Here's what you need to test
this:
- Make sure you have the latest commits of apertium and lttoolbox installed.
- Latest commits of -separable, -anaphora, etc. if you're using those in
your mode.
- Clone and install https://github.com/TinoDidriksen/transfuse .

After this all you need to do is pipe your html document to
tf-html-fragment and give as argument a translation mode of your language
pair of choice (full translation modes).

Example:

$ echo 'Hello <b>big green</b> <i>world</i>!' | tf-html-fragment
/Users/khannatanmai/Documents/GSoC/repo/main/apertium-eng-spa/modes/eng-spa.mode


Hola <i>Mundo</i> <b>verde grande</b> !


It only works for html right now, but we're in the process of supporting
all usual document types.


*Known issues:*

- If a transfer rule has multiple words in the pattern, and in the output
there is a LU that wasn't clipped from any word in the input, it won't put
a wordbound blank on that LU.

- If -separable detects a string of words then the format of each will be
combined and added on the entire string of words.

- apertium-recursive isn't supported as of now. It will be by the end of
the project though.


If you have any questions, suggestions, I'd be glad to respond to them on
this thread. If you need help testing this on your language pair you can
contact us on the IRC. Same if you find any bugs, or have any feature
requests.


Enjoy!
*तन्मय खन्ना *
*Tanmai Khanna*
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to