Hi Achim You could look at the moses 'zones and walls' feature http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc14
Also, there has been some work on translating web pages with moses, which uses your option (3) below. http://www.statmt.org/moses/?n=Moses.WebTranslation best regards Barry On Thursday 09 September 2010 04:09, Achim Ruopp wrote: > Hi, > > In my projects I have quite a bit of inline formatting that Moses is not > able to handle out-of-the-box. I plan to write code that preserves inline > formatting in formats like the Rich Text format during translation as part > of the Moses for Localization open source project > (http://groups.google.com/group/m4loc. > > > > E.g. I want to translate sentences like this: > > This is some really bold text. > > This is marked up in Rich Text Format like this: > > This is some {\b really bold} text. > > > > Typical for such inline formatting is that the formatting markup is paired > and it can be nested, i.e. you could have something like: > > This is some {\b really bold {\i and also italic}} text. > > Sometimes there is also unmatched inline formatting. > > > > The ideas I have to do this with a (phrase-based) Moses system are: > > 1. Wrap the markup in XML and use the Moses -xml-input exclusive > option to insert the markup into the translation, i.e. translate > This is some <m translation= "{\b">{\b</m> really bold <m > translation="}">}</m> text. > > The issue is that during the markup gets jumbled through phrase > rearranging- closing tags could move before opening tags, nested constructs > could get distorted. I'd have to come up with a smart algorithm how to fix > these rearrangements. > > 2. Transform the markup into XML markup and use the Moses -xml-input > exclusive option to preserve the markup similar to specifying reordering > constraints (see > http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc14) > This is some <bold> really bold</bold> text. > After translation transform the XML markup back into the right markup for > the format (e.g. <bold> -> {\b) Will the XML be deleted during translation? > > 3. Remove any formatting from the text before translation and use the > decoder extended output option (-t) to determine which target language > phrases where generated by which source language phrases. Use this > information to project the formatting information to the target sentence. > > > > Is there a best option among the three above? Why? Are there other options > that I missed? > > > > Thanks in advance! > > > > If you are interested in the topic and would like to participate, please > small 'r'. I'm looking for collaborators. > > > > Achim -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
