Hi Tom, Although I think command line tools such as sed are able to convert the problematic characters, I've installed DoMy CE and try it. But I wonder how can I use two plugin modules. Do you have any document for them?
Best regards, -- Hwidong Na <[email protected]> KLE lab, POSTECH, KOREA 2011-02-14 (월), 14:21 +0700, Tom Hoar: > Hi hwidong, > > This link lists a table of problematic character and character > sequences that must be removed or escaped before training a translation > model, and before translating your new work. > > DoMY includes two plugin modules, replace-escape-control.py and > replace-unescape-control.py, that escape and un-escape these characters. > > > http://www.precisiontranslationtools.com/index.php?option=com_content&view=article&id=94:are-there-characters-that-cause-problems-in-moses&catid=30:key-concepts&Itemid=57 > > Regards, > Tom > > > On Mon, 14 Feb 2011 13:29:54 +0800, Hieu Hoang <[email protected]> > wrote: > > Hi hwidong > > > > You probably have to preprosess the corpus to get rid of < and > > > symbols, as well as [ and ] symbols > > > > Hieu > > Sent from my flying horse > > > > On 14 Feb 2011, at 11:30 AM, Hwidong Na <[email protected]> wrote: > > > >> Hi, > >> > >> When I extract hierarchical phrases using the EMS. The extraction > >> step > >> step crashed, and it seems to identify xml tags during the > >> extraction. > >> For example, one of the error messages is > >> > >> ERROR: malformed XML: It was kept in the ice bath for 30 min , at > >> ambient temperature for 2 h and at < 0 " C for 18 h . It was then > >> diluted with CH2Cl2 , washed with water and brine , dried ( MgSO4 ) > >> and > >> concentrated . > >> no target (0) or source (43) words << end insentence 993688 > >> T: It was kept in the ice bath for 30 min , at ambient temperature > >> for 2 > >> h and at < 0 " C for 18 h . It was then diluted with CH2Cl2 , washed > >> with water and brine , dried ( MgSO4 ) and concentrated . > >> S: 将 其 在 冰浴 中 放置 30 分钟 , 室温 放置 2 小时 , 然后 在 < 0 ℃ > >> 下 放置 18 小时 。 将 其 用 CH2Cl2 稀释 , 用水 和 盐 水 洗涤 , 干燥 > >> ( MgSO4 ) 并 浓缩 。 > >> > >> The revision number is 3729. Should I update to the newest revision? > >> > >> Best regards, > >> -- > >> Hwidong Na <[email protected]> > >> KLE lab, POSTECH, KOREA > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
