Hello Ergun, we've had the 'nan' issue reported before ( see https://moses-support.mit.narkive.com/hs8LwsnT/blingual-neural-lm-log-likelihood-nan https://moses-support.mit.narkive.com/fklzlBiW/bilingual-lm-nan-nan-nan ).
You can follow Nick's recommendation of lowering the learning rate, or try to enable gradient clipping (which is commented out in the code). I'm afraid nlpm is no longer heavily used, so it's unlikely that somebody has fresh experience. best wishes, Rico On 15/04/2019 12:44, Ergun Bicici wrote:
I found that training also produced 'nan' scores: Training NCE log-likelihood: nan. I used EMS training: [LM:comb] nplm-dir = "Programs/nplm/" order = 5 source-window = 4 bilingual-lm = yes bilingual-lm-settings = "--prune-source-vocab 100000 --prune-target-vocab 100000" I am re-running train_nplm.py. Ergun On Mon, Apr 15, 2019 at 2:26 PM Ergun Bicici <[email protected] <mailto:[email protected]>> wrote: Dear moses-support, I tried the nplm model on the German-English baseline dataset ( wget http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz) and it improved the scores from 0.2266 to 0.2317 BLEU. I tried the bilingual LM: http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37 However: - vocab files were not written in the end and I used extract_training.py to obtain them. - I still obtained 'nan' scores from the bilingual lm model. Error: "Not a label, not a score 'nan'. Failed to parse the scores string: 0 ||| ... айта ... болатын . ||| LexicalReordering0= -11.3723 -15.4848 -26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538 OpSequenceModel0= -403.825 99 22 45 5 Distortion0= -146 LM0= -685.828 BLMcomb= nan WordPenalty0= -76 PhrasePenalty0= 53 TranslationModel0= -242.874 -179.189 -291.623 -342.085 ||| nan KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6 BilingualNPLM name=BLMcomb order=5 source_window=4 path=wmt19_en-kk/lm/comb.blm.2/train.10 source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target Therefore, this may be due to some bug in moses C++ code and not the input data / configuration. The documentation appears also not in sync about "average the <null> word embedding as per the instructions here <http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>." part since averageNullEmbedding.py asks for -i, -o, and -t. I found some related note in a paper by Barry Haddow at WMT'15 saying that the model is not used in the final submission due to insignificant differences. Do you have any recent results on the bilingual LM model? -- Regards, Ergun -- Regards, Ergun _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
