Hi Rico, Thanks for the links. Accordingly, I tried decreasing the learning rate to 0.25 and starting seeing numbers instead of nan in the log-likelihood. vocabulary files are not needed using train_nplm.py.
I restarted tuning and 'nan' scores for bilingual lm disappeared as well in the N-best lists. I'll post the new scores on the German-English baseline. Ergun On Mon, Apr 15, 2019 at 3:43 PM Rico Sennrich <[email protected]> wrote: > Hello Ergun, > > we've had the 'nan' issue reported before ( see > > https://moses-support.mit.narkive.com/hs8LwsnT/blingual-neural-lm-log-likelihood-nan > https://moses-support.mit.narkive.com/fklzlBiW/bilingual-lm-nan-nan-nan ). > > You can follow Nick's recommendation of lowering the learning rate, or try > to enable gradient clipping (which is commented out in the code). > > I'm afraid nlpm is no longer heavily used, so it's unlikely that somebody > has fresh experience. > > best wishes, > Rico > > On 15/04/2019 12:44, Ergun Bicici wrote: > > > I found that training also produced 'nan' scores: > Training NCE log-likelihood: nan. > > I used EMS training: > [LM:comb] > nplm-dir = "Programs/nplm/" > order = 5 > source-window = 4 > bilingual-lm = yes > bilingual-lm-settings = "--prune-source-vocab 100000 --prune-target-vocab > 100000" > > I am re-running train_nplm.py. > > Ergun > > On Mon, Apr 15, 2019 at 2:26 PM Ergun Bicici <[email protected]> wrote: > >> >> Dear moses-support, >> >> I tried the nplm model on the German-English baseline dataset ( wget >> http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz) and it improved >> the scores from 0.2266 to 0.2317 BLEU. >> >> I tried the bilingual LM: >> >> http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37 >> However: >> - vocab files were not written in the end and I used extract_training.py >> to obtain them. >> - I still obtained 'nan' scores from the bilingual lm model. >> Error: "Not a label, not a score 'nan'. Failed to parse the scores string: >> 0 ||| ... айта ... болатын . ||| LexicalReordering0= -11.3723 -15.4848 >> -26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538 OpSequenceModel0= >> -403.825 99 22 45 5 Distortion0= -146 LM0= -685.828 BLMcomb= nan >> WordPenalty0= -76 PhrasePenalty0= 53 TranslationModel0= -242.874 -179.189 >> -291.623 -342.085 ||| nan >> >> KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6 >> BilingualNPLM name=BLMcomb order=5 source_window=4 >> path=wmt19_en-kk/lm/comb.blm.2/train.10 >> source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source >> target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target >> >> Therefore, this may be due to some bug in moses C++ code and not the >> input data / configuration. >> >> The documentation appears also not in sync about "average the <null> >> word embedding as per the instructions here >> <http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>." >> part since averageNullEmbedding.py asks for -i, -o, and -t. >> >> I found some related note in a paper by Barry Haddow at WMT'15 saying >> that the model is not used in the final submission due to insignificant >> differences. >> >> Do you have any recent results on the bilingual LM model? >> >> -- >> >> Regards, >> Ergun >> >> >> > > -- > > Regards, > Ergun > > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Regards, Ergun
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
