Ken,
I traced this. The problem is not specific to MGIZA++. It manifested itself the first time with merge_alignment.py in train-model.perl's step 2. In train-model.perl step 5, it happens again when "extract" redirects output to a path. This redirection problem does not happen with 2- or 3-byte UTF-8 Latin, Japanese or Chinese characters that I tried. All Thai UTF-8 characters are 3 bytes and there are many documented cases where UTF handlers don't handle Thai properly. It appears Perl's system() call is one of these cases. I'm working through the problem on stackoverflow.com: http://stackoverflow.com/questions/14020240/why-does-perl-system-corrupt-the-redirected-path [1] but no time for more tests during the holiday. I'll share a fix if/when there's a resolution. For now, Hieu's recommendation to document the problem and block such requests in the front end is a good work-around. Tom On 2012-12-28 06:43, Kenneth Heafield wrote: > +Qin Gao > > Is this a train-model.perl problem or an mgiza problem? Links: ------ [1] http://stackoverflow.com/questions/14020240/why-does-perl-system-corrupt-the-redirected-path
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
