Hi, 

In my projects I have quite a bit of inline formatting that Moses is not
able to handle out-of-the-box. I plan to write code that preserves inline
formatting in formats like the Rich Text format during translation as part
of the Moses for Localization open source project
(http://groups.google.com/group/m4loc.

 

E.g. I want to translate sentences like this:

This is some really bold text.

This is marked up in Rich Text Format like this:

This is some {\b really bold} text.

 

Typical for such inline formatting is that the formatting markup is paired
and it can be  nested, i.e. you could have something like:

This is some {\b really bold {\i and also italic}} text.

Sometimes there is also unmatched inline formatting.

 

The ideas I have to do this with a (phrase-based) Moses system are:

1.       Wrap the markup in XML and use the Moses -xml-input exclusive
option to insert the markup into the translation, i.e. translate
This is some <m translation= "{\b">{\b</m> really bold <m
translation="}">}</m> text.

The issue is that during the markup gets jumbled through phrase rearranging-
closing tags could move before opening tags, nested constructs could get
distorted. I'd have to come up with a smart algorithm how to fix these
rearrangements. 

2.       Transform the markup into XML markup and use the Moses -xml-input
exclusive option to preserve the markup similar to specifying reordering
constraints (see
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc14)
This is some <bold> really bold</bold> text.
After translation transform the XML markup back into the right markup for
the format (e.g. <bold> -> {\b) Will the XML be deleted during translation?

3.       Remove any formatting from the text before translation and use the
decoder extended output option (-t) to determine which target language
phrases where generated by which source language phrases. Use this
information to project the formatting information to the target sentence.

 

Is there a best option among the three above? Why? Are there other options
that I missed?

 

Thanks in advance!

 

If you are interested in the topic and would like to participate, please
small 'r'. I'm looking for collaborators.

 

Achim



 

 

 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to