Hello Ergun,

we've had the 'nan' issue reported before ( see
https://moses-support.mit.narkive.com/hs8LwsnT/blingual-neural-lm-log-likelihood-nan
https://moses-support.mit.narkive.com/fklzlBiW/bilingual-lm-nan-nan-nan ).

You can follow Nick's recommendation of lowering the learning rate, or
try to enable gradient clipping (which is commented out in the code).

I'm afraid nlpm is no longer heavily used, so it's unlikely that
somebody has fresh experience.

best wishes,
Rico

On 15/04/2019 12:44, Ergun Bicici wrote:

I found that training also produced 'nan' scores:
Training NCE log-likelihood: nan.

I used EMS training:
[LM:comb]
nplm-dir = "Programs/nplm/"
order = 5
source-window = 4
bilingual-lm = yes
bilingual-lm-settings = "--prune-source-vocab 100000
--prune-target-vocab 100000"

I am re-running train_nplm.py.

Ergun

On Mon, Apr 15, 2019 at 2:26 PM Ergun Bicici <[email protected]
<mailto:[email protected]>> wrote:


    Dear moses-support,

    I tried the nplm model on the German-English baseline dataset
    ( wget
    http://www.statmt.org/wmt13/training-parallel-nc-v8.tgz) and it
    improved the scores from 0.2266 to 0.2317 BLEU.

    I tried the bilingual LM:
    http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#ntoc37
    However:
    - vocab files were not written in the end and I used
    extract_training.py to obtain them.
    - I still obtained 'nan' scores from the bilingual lm model.
    Error: "Not a label, not a score 'nan'. Failed to parse the scores
    string:
    0 ||| ... айта ... болатын .  ||| LexicalReordering0= -11.3723
    -15.4848 -26.5152 -17.8301 -6.95664 -16.8553 -29.4425 -22.5538
    OpSequenceModel0= -403.825 99 22 45 5 Distortion0= -146 LM0=
    -685.828 BLMcomb= nan WordPenalty0= -76 PhrasePenalty0= 53
    TranslationModel0= -242.874 -179.189 -291.623 -342.085 ||| nan

    KENLM name=LM0 factor=0 path=en-kk/lm.corpus.tok.kk.6.blm.bin order=6
    BilingualNPLM name=BLMcomb order=5 source_window=4
    path=wmt19_en-kk/lm/comb.blm.2/train.10
    source_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.source
    target_vocab=wmt19_en-kk/lm/comb.blm.2/vocab.target

    Therefore, this may be due to some bug in moses C++ code and not
    the input data / configuration.

    The documentation appears also not in sync about "average the
    <null> word embedding as per the instructions here
    
<http://www.statmt.org/moses/?n=FactoredTraining.BuildingLanguageModel#anchorNULL>."
    part since averageNullEmbedding.py asks for -i, -o, and -t.

    I found some related note in a paper by Barry Haddow at WMT'15
    saying that the model is not used in the final submission due to
    insignificant differences.

    Do you have any recent results on the bilingual LM model?

    --

    Regards,
    Ergun




--

Regards,
Ergun



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to