Hello again, I have checked the encoding of all my LM, TM, tune and test set files and they are all utf-8. So, encoding does not seem to be the issue. I also verified that there were no mistakes involving the source and target side while making and using the LM's. I am retraining all my model files and will try this again.
Regards. On Tue, Jul 7, 2015 at 12:43 PM, Raj Dabre <[email protected]> wrote: > Hello Rico, > Now that you mention it I also performed an additional test. > I took a translation and obtained the perplexity score by querying the > kenlm and nplm from the command line. In this case the difference between > the scores was not that large. > It might be an encoding issue. > I will check again and let you know. > > However the data I am using to train the LM's (KENLM, NPLM and BILM) is > the same as I am using to train. I should also mention that I did no > tokenization etc before training the LM's and the TM. > Thanks for your replies. > Regards. > > On Tue, Jul 7, 2015 at 1:18 AM, Rico Sennrich <[email protected]> > wrote: > >> Hi Raj, >> >> the information you provide is pretty vague, so I'm just making some wild >> guesses here: >> >> it could be a user error, for instance an inconsistency between the >> training sets used for training BilingualNPLM and the phrase table. Check >> that the same version of the corpus (including tokenization, truecasing >> etc.) was used for training, and that you did not mix up source and target >> language. Also check that the settings during training are consistent with >> those in the moses.ini file. >> >> it's possible that some of the settings (vocabulary size, number of >> training epochs, or similar) are unsuitable for your task. For example, >> since you have a relatively small training corpus, you may need more epochs >> of training to get good results (use a validation set to see if model >> perplexity converges). >> >> please double-check that there were no problems with the unicode-handling >> of Japanese/Chinese characters, and that the encoding of your vocabulary >> files matches that of the translation model, and the decoder input. We have >> never experienced such problems, but they could arise for some system >> configurations. >> >> best wishes, >> Rico >> >> >> >> On 06.07.2015 16:31, Raj Dabre wrote: >> >> Hello Rico, >> I trained both mono as well as bilingual LM's. >> Both seemed ineffective. >> As I mentioned before, I am working with Chinese-Japanese and the domain >> is paper abstracts. >> I did check the n-best lists and I saw a significant difference between >> the LM scores when comparing the runs for KenLm and NPLM. >> What could have gone wrong during the training? >> Regards. >> >> On Mon, Jul 6, 2015 at 10:53 PM, Rico Sennrich <[email protected]> >> wrote: >> >>> Hello Raj, >>> >>> can you please clarify if you tried to train a monolingual LM >>> (NeuralLM), a bilingual LM (BilingualNPLM), or both? Our previous >>> experiences with BilingualNPLM are mixed, and we observed improvements for >>> some tasks and language pairs, but not for others. See for instance: >>> >>> Alexandra Birch, Matthias Huck, Nadir Durrani, Nikolay Bogoychev and >>> Philipp Koehn. 2014. Edinburgh SLT and MT System Description for the IWSLT >>> 2014 Evaluation. Proceedings of IWSLT 2014. >>> >>> To help debugging, you can check the scores in the n-best lists of the >>> tuning runs. If the NPLM features give much higher costs than KenLM >>> (trained on the same data), this can indicate that something went wrong >>> during training. >>> >>> best wishes, >>> Rico >>> >>> On 06.07.2015 14:29, Raj Dabre wrote: >>> >>> Dear all, >>> I have checked out the latest version of moses and nplm and compiled >>> moses successfully with the --with-nplm option. >>> I got a ton of warnings during compilation but in the end it all worked >>> out and all the desired binaries were created. Simply executing the moses >>> binary told me the the BilingualNPLM and NeuralLM features were available. >>> >>> I trained an NPLM model based on the instructions here: >>> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc33 >>> The corpus size I used was about 600k lines (for Chinese-Japanese; >>> Target is Japanese) >>> >>> I then integrated the resultant language model (after 10 iterations) >>> into the decoding process by moses.ini >>> >>> I initiated tuning (standard parameters) and I got no errors, which >>> means that the neural language model (NPLM) was recognized and queried >>> appropriately. >>> I also ran tuning without a language model. >>> >>> The strange thing is that the tuning and test BLEU scores for both >>> these cases are almost the same. I checked the weights and saw that the LM >>> was assigned a very low weight. >>> >>> On the other hand when I used KENLM on the same data.... I had >>> comparatively higher BLEU scores. >>> >>> Am I missing something? Am I using the NeuralLM in an incorrect way? >>> >>> Thanks in advance. >>> >>> >>> >>> -- >>> Raj Dabre. >>> Doctoral Student, >>> Graduate School of Informatics, >>> Kyoto University. >>> CSE MTech, IITB., 2011-2014 >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing >>> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> >> -- >> Raj Dabre. >> Doctoral Student, >> Graduate School of Informatics, >> Kyoto University. >> CSE MTech, IITB., 2011-2014 >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > -- > Raj Dabre. > Doctoral Student, > Graduate School of Informatics, > Kyoto University. > CSE MTech, IITB., 2011-2014 > > -- Raj Dabre. Doctoral Student, Graduate School of Informatics, Kyoto University. CSE MTech, IITB., 2011-2014
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
