Ken, 

I traced this. The problem is not specific to MGIZA++. It
manifested itself the first time with merge_alignment.py in
train-model.perl's step 2. In train-model.perl step 5, it happens again
when "extract" redirects output to a path. 

This redirection problem
does not happen with 2- or 3-byte UTF-8 Latin, Japanese or Chinese
characters that I tried. All Thai UTF-8 characters are 3 bytes and there
are many documented cases where UTF handlers don't handle Thai properly.
It appears Perl's system() call is one of these cases. 

I'm working
through the problem on stackoverflow.com:
http://stackoverflow.com/questions/14020240/why-does-perl-system-corrupt-the-redirected-path
[1] but no time for more tests during the holiday. I'll share a fix
if/when there's a resolution. 

For now, Hieu's recommendation to
document the problem and block such requests in the front end is a good
work-around. 

Tom 

On 2012-12-28 06:43, Kenneth Heafield wrote: 

>
+Qin Gao
> 
> Is this a train-model.perl problem or an mgiza problem?



Links:
------
[1]
http://stackoverflow.com/questions/14020240/why-does-perl-system-corrupt-the-redirected-path
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to