Re: [Moses-support] Encoding problems for the parallel corpora and language model

Barry Haddow Thu, 22 Mar 2012 15:13:27 -0700

Hi Loki

On Thursday 22 Mar 2012 12:38:19 Loki Cheng wrote:
> Hi, Philipp
> I have few questions bellow:
> 1. What happen if the parallel corpora are decoded with UTF8 format and
> language model generated by IRSTLM is decode with BIG5?  Will it work?


This is unlikely to work. The parallel data and the language model training 
data should use the same character encoding.

> 2. If 1. won't work. Do I need to regenerate the language model encoded
> with UTF8 and rerun the "train-model.perl" script since the language model
> is replaced with the newer one?

You don't need to rerun train-model.perl if you are just changing the language 
model, but you do need to update the ini file and rerun tuning,

cheers - Barry


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Encoding problems for the parallel corpora and language model

Reply via email to