Taylor, You can have a look at the M4Loc project http://code.google.com/p/m4loc/ We are working on pre-/post-processing scripts to preserve inline formatting like you describe. Moses itself has the option to wrap non-translatable text like the tags in XML (http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc4), but this doesn't address how to treat these tags during tokenization/recasing.
Achim -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Taylor Rose Sent: Thursday, September 08, 2011 11:30 AM To: [email protected] Subject: [Moses-support] Ignoring Symbols? Hello, I've recently started working with Moses as part of my new internship. The company I work for uses in-house formatting tags on documents. (ie. paragraph, bold, indent, etc.) Is there a way I can make Moses ignore these and keep them in the correct position after translation? My first thoughts were to somehow tell Moses that <bold> in English should translate to <bold> in Spanish but I haven't found a way to do this if it is even possible. I'm still learning Moses so please hold off on the RTFMs. The website is huge and I've only scratched the surface of the documentation. I would appreciate any links you could provide to relevant documents. Thanks, -- Taylor Rose Machine Translation Intern Language Intelligence _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
