Hi Tom,

Although I think command line tools such as sed are able to convert the
problematic characters, I've installed DoMy CE and try it. But I wonder
how can I use two plugin modules. Do you have any document for them?

Best regards,
-- 
Hwidong Na <[email protected]>
KLE lab, POSTECH, KOREA

2011-02-14 (월), 14:21 +0700, Tom Hoar:
> Hi hwidong,
> 
>  This link lists a table of problematic character and character 
>  sequences that must be removed or escaped before training a translation 
>  model, and before translating your new work.
> 
>  DoMY includes two plugin modules, replace-escape-control.py and 
>  replace-unescape-control.py, that escape and un-escape these characters.
> 
>  
> http://www.precisiontranslationtools.com/index.php?option=com_content&view=article&id=94:are-there-characters-that-cause-problems-in-moses&catid=30:key-concepts&Itemid=57
> 
>  Regards,
>  Tom
> 
> 
>  On Mon, 14 Feb 2011 13:29:54 +0800, Hieu Hoang <[email protected]> 
>  wrote:
> > Hi hwidong
> >
> > You probably have to preprosess the corpus to get rid of < and >
> > symbols, as well as [ and ] symbols
> >
> > Hieu
> > Sent from my flying horse
> >
> > On 14 Feb 2011, at 11:30 AM, Hwidong Na <[email protected]> wrote:
> >
> >> Hi,
> >>
> >> When I extract hierarchical phrases using the EMS. The extraction 
> >> step
> >> step crashed, and it seems to identify xml tags during the 
> >> extraction.
> >> For example, one of the error messages is
> >>
> >> ERROR: malformed XML: It was kept in the ice bath for 30 min , at
> >> ambient temperature for 2 h and at < 0 " C for 18 h . It was then
> >> diluted with CH2Cl2 , washed with water and brine , dried ( MgSO4 ) 
> >> and
> >> concentrated .
> >> no target (0) or source (43) words << end insentence 993688
> >> T: It was kept in the ice bath for 30 min , at ambient temperature 
> >> for 2
> >> h and at < 0 " C for 18 h . It was then diluted with CH2Cl2 , washed
> >> with water and brine , dried ( MgSO4 ) and concentrated .
> >> S: 将 其 在 冰浴 中 放置 30 分钟 , 室温 放置 2 小时 , 然后 在 < 0 ℃
> >> 下 放置 18 小时 。 将 其 用 CH2Cl2 稀释 , 用水 和 盐 水 洗涤 , 干燥
> >> ( MgSO4 ) 并 浓缩 。
> >>
> >> The revision number is 3729. Should I update to the newest revision?
> >>
> >> Best regards,
> >> --
> >> Hwidong Na <[email protected]>
> >> KLE lab, POSTECH, KOREA
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> 
> 





_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to