Hi Achim

You could look at the moses 'zones and walls' feature
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc14

Also, there has been some work on translating web pages with moses, which uses 
your option (3) below. 
http://www.statmt.org/moses/?n=Moses.WebTranslation

best regards
Barry

On Thursday 09 September 2010 04:09, Achim Ruopp wrote:
> Hi,
>
> In my projects I have quite a bit of inline formatting that Moses is not
> able to handle out-of-the-box. I plan to write code that preserves inline
> formatting in formats like the Rich Text format during translation as part
> of the Moses for Localization open source project
> (http://groups.google.com/group/m4loc.
>
>
>
> E.g. I want to translate sentences like this:
>
> This is some really bold text.
>
> This is marked up in Rich Text Format like this:
>
> This is some {\b really bold} text.
>
>
>
> Typical for such inline formatting is that the formatting markup is paired
> and it can be  nested, i.e. you could have something like:
>
> This is some {\b really bold {\i and also italic}} text.
>
> Sometimes there is also unmatched inline formatting.
>
>
>
> The ideas I have to do this with a (phrase-based) Moses system are:
>
> 1.       Wrap the markup in XML and use the Moses -xml-input exclusive
> option to insert the markup into the translation, i.e. translate
> This is some <m translation= "{\b">{\b</m> really bold <m
> translation="}">}</m> text.
>
> The issue is that during the markup gets jumbled through phrase
> rearranging- closing tags could move before opening tags, nested constructs
> could get distorted. I'd have to come up with a smart algorithm how to fix
> these rearrangements.
>
> 2.       Transform the markup into XML markup and use the Moses -xml-input
> exclusive option to preserve the markup similar to specifying reordering
> constraints (see
> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc14)
> This is some <bold> really bold</bold> text.
> After translation transform the XML markup back into the right markup for
> the format (e.g. <bold> -> {\b) Will the XML be deleted during translation?
>
> 3.       Remove any formatting from the text before translation and use the
> decoder extended output option (-t) to determine which target language
> phrases where generated by which source language phrases. Use this
> information to project the formatting information to the target sentence.
>
>
>
> Is there a best option among the three above? Why? Are there other options
> that I missed?
>
>
>
> Thanks in advance!
>
>
>
> If you are interested in the topic and would like to participate, please
> small 'r'. I'm looking for collaborators.
>
>
>
> Achim

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to