Hi,
In my projects I have quite a bit of inline formatting that Moses is not
able to handle out-of-the-box. I plan to write code that preserves inline
formatting in formats like the Rich Text format during translation as part
of the Moses for Localization open source project
(http://groups.google.com/group/m4loc.
E.g. I want to translate sentences like this:
This is some really bold text.
This is marked up in Rich Text Format like this:
This is some {\b really bold} text.
Typical for such inline formatting is that the formatting markup is paired
and it can be nested, i.e. you could have something like:
This is some {\b really bold {\i and also italic}} text.
Sometimes there is also unmatched inline formatting.
The ideas I have to do this with a (phrase-based) Moses system are:
1. Wrap the markup in XML and use the Moses -xml-input exclusive
option to insert the markup into the translation, i.e. translate
This is some <m translation= "{\b">{\b</m> really bold <m
translation="}">}</m> text.
The issue is that during the markup gets jumbled through phrase rearranging-
closing tags could move before opening tags, nested constructs could get
distorted. I'd have to come up with a smart algorithm how to fix these
rearrangements.
2. Transform the markup into XML markup and use the Moses -xml-input
exclusive option to preserve the markup similar to specifying reordering
constraints (see
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc14)
This is some <bold> really bold</bold> text.
After translation transform the XML markup back into the right markup for
the format (e.g. <bold> -> {\b) Will the XML be deleted during translation?
3. Remove any formatting from the text before translation and use the
decoder extended output option (-t) to determine which target language
phrases where generated by which source language phrases. Use this
information to project the formatting information to the target sentence.
Is there a best option among the three above? Why? Are there other options
that I missed?
Thanks in advance!
If you are interested in the topic and would like to participate, please
small 'r'. I'm looking for collaborators.
Achim
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support