Hello Sanjanasri,

Basically, you can forget all results that you obtained without tuning. They are not a meaningful indicator of the quality of NPLM. If you add a new language model, the weight of the other language models, translation models etc. needs to be balanced accordingly, and that is what tuning does.

If you do tuning every time you add/remove/change a model, you can modify your moses.ini any way you want, and you do not need to specify all models during training. But again, don't make conclusions without tuning for each modified moses.ini. I know this will slow down your experiments, but that's the only way to produce new knowlege.

setting num_hidden to zero is a bit of a hack, because NPLM normally has two hidden layers (and num_hidden gives the size of the first hidden layer). If we want to use NPLM in decoding, we only want one hidden layer for speed, hence num_hidden=0 [output_embedding_dimension is the size of the remaining hidden layer]. The vocabulary size thing was a guess. I don't know what the best vocabulary size would be for you, but I tend to recommend smaller vocabularies if you have less data.

best wishes,
Rico


On 05/10/15 17:14, Sanjanashree Palanivel wrote:
Dear Rico,

Thanks a lot. I will increase the number of hidden layers, currently the parameter num_hidden is fixed as Zero. I did not tune the system just working with baseline. Will do tuning too. But, why smaller vocabulary size, what is the reason?. It should be my mistake, the testset may not be an disjoint set i guess.I will check. I got two more doubts. Please clarify.


1) Can I use two or more LMs (specifically NPLMs with different vocabulary size) in moses.ini file or should i use LMs that are used in training phase. Ex: Can I use SRILM or RANDLM directly to moses.ini file without specifying in training phase.

2) I just did little tinkering in moses.ini file for English-Hindi MT System to understand the influence of LM. I trained the system with KENLM of order 3. But, in *.ini file, the replaced the lm to 5-order KENLM binary file. Let me name the *.ini file as dummy 5-KENLM. The Bleu of Score of dummy 5-KENLM (14.72) is higher than the 3-order KENLM (12.53) , though the Bleu score is lesser than the model actually trained with 5-KENLM (17.43). I understand, what i modified in *.ini does not make any sense. But, I wish to know, why score is higher for dummy 5-KENLM than 3-KENLM.



On Mon, Oct 5, 2015 at 7:07 PM, Rico Sennrich <[email protected] <mailto:[email protected]>> wrote:

    Hi Sanjanasri,

    1) your corpus is very small, and you may have to use more
    iterations of NPLM training and smaller vocabulary sizes. Just to
    double-check, are you tuning your systems? MERT (or PRO or MIRA)
    should normally ensure that adding a model doesn't make BLEU go down.

    2) I'm not sure which perplexity is for which model, but lower
    perplexity is better, so this makes sense.

    3) a perplexity of 3 is *extremely* low. Do you have overlap
    between your test set and your training set? This would be an
    unrealistic test setting, and would explain why KenLM does so much
    better (because backoff n-gram models are good at memorizing things).

    best wishes,
    Rico



    On 05.10.2015 09:27, Sanjanashree Palanivel wrote:
    Dear Rico,

                     I tried using KENLM and NPLM for three language
    pairs. And I came across series of questions . I am listing it
    one by one. It would be great if you could guide me.


    1) I did testing for NPLM with different vocabulary sizes and
    training epochs. But, the bleu score, I gained from NPLM
    integrated with KENLM is smaller than the one I trained with
    KENLM. In all the three language pairs I get a standard
    difference of three.

    Eg: English to Hindi (KENLM-17.43, NPLM+KENLM-14.27)
          Tamil to Hindi (KENLM-16.66,NPLM+KENLM-13.53)
           Marathi to Hindi (KENLM-29.42,NPLM+KENLM-25.76)

     The sentence count is 103502. unigram count is 89919. I gave
    vocabulary size as 89000,89700,89850 with validation size
    200,200,100 respectively and with different learning rate and
    epocs. However, I am getting Bleu score of NPLM and KENLM is lesser.


    2)The Bleu score of the model having perplexity about 385 has
    higher Bleu score than the one having pp around 564 .  Is this
    rite model. I mean the model with lower perplexity seems to give
    better Bleu score. Where am I doing worng.


    3) I used query script for KENLM model. I found perplexity to
    3.4xx. But, the Bleu score of KENLM alone in decoding phase gives
    Blue of 16.66 for English to HIndi MT. But, when combined with
    NPLM I get only 13.53.

    On Sun, Sep 20, 2015 at 8:07 PM, Sanjanashree Palanivel
    <[email protected] <mailto:[email protected]>> wrote:

        Dear Rico,

                    Thanks a lot for your excellent guidance.

        On Sat, Sep 19, 2015 at 9:10 PM, Rico Sennrich
        <[email protected] <mailto:[email protected]>> wrote:

            Hi Sanjanasri,

            we have seen improvements in BLEU from having both KENLM
            and NPLM in our system. Things can go wrong during
            training though (e.g. a bad choice of hyperparameters
            (vocabulary size, number of training epochs)). I
            recommend using a development set during NPLM training,
            and comparing perplexity scores with those obtained from
            KENLM.

            maybe somebody else can help you with the phrase table
            normalization. NPLM doesn't have binarization.

            best wishes,
            Rico


            On 19/09/15 08:11, Sanjanashree Palanivel wrote:
            Dear Rico,

                        I did necessary changes and I trained
            language model succesfully. The language model of nplm
            gives me lesser BLEU score when compared to KENLM. But,
            when I used two models together accuracy is greater than
            the one I got in NPLM alone but lesser than KENLM. I am
            just trying to tune it by changing the parameters. So
            far the accuracy is getting improved but not close to
            KENLM accuracy.  Is that worthy to do because its taking
            quite a long time to train.

             I also tried to binarize the phrase table following
            this
            http://www.statmt.org/moses/?n=Advanced.RuleTables#ntoc3, and
            compilation with moses is done succesfully. But. when i run
            processPhraseTableMin -threads 3 -in train/model/phrase-table.gz
            -nscores 4 -out binarised-model/phrase-table
            I am getting segmentation fault. I dont know what is worng. Is 
there something todo with threads
            Also how to binarize nplm model

            On Fri, Sep 18, 2015 at 11:27 AM, Sanjanashree Palanivel
            <[email protected] <mailto:[email protected]>>
            wrote:

                Dear Rico,

                             Thanks a lot. Will do the necessary
                changes


                On Thu, Sep 17, 2015 at 1:54 PM, Rico Sennrich
                <[email protected] <mailto:[email protected]>>
                wrote:

                    Hi Sanjanasri,

                    if you first compiled moses without the option
                    '--with-nplm', and then add the option later,
                    the build system isn't smart enough to know
                    which files it needs to recompile. if you change
                    one of the compile options, use the option '-a'
                    to force recompilation from scratch.

                    best wishes,
                    Rico




                    On 16/09/15 06:30, Sanjanashree Palanivel wrote:
                    Dear Rico,


                    I did the following steps


                        1. Installed NPLM and trained a language model
                        2. I compiled it with Moses with the
                        command ./bjam --with-nplm=path/to/nplm

                                  ./bjam
                        --with-nplm=/home/sanjana/Documents/SMT/NPLM/nplm
                        Tip: install tcmalloc for faster threading.
                        See BUILD-INSTRUCTIONS.txt for more
                        information.
                        warning: No toolsets are configured.
                        warning: Configuring default toolset "gcc".
                        warning: If the default is wrong, your
                        build may not work correctly.
                        warning: Use the "toolset=xxxxx" option to
                        override our guess.
                        warning: For more configuration options,
                        please consult
                        warning:
                        
http://boost.org/boost-build2/doc/html/bbv2/advanced/configuration.html
                        NOT BUILDING MOSES SERVER!
                        Performing configuration checks

                            - Shared Boost : yes (cached)
                            - Static Boost : yes (cached)
                        ...patience...
                        ...patience...
                        ...found 4823 targets...
                        SUCCESS

                        3. I added the the following lines to the
moses.ini file
                             NeuralLM factor=0 name=LM1 order=5
                            path=/path/to/nplmmodel
                            LM1= 0.5

                    Then i did testing. and end up with the error


                    On Tue, Sep 15, 2015 at 8:43 PM, Rico Sennrich
                    <[email protected]
                    <mailto:[email protected]>> wrote:

                        Hi Sanjanasri,

                        this error occurs when Moses was compiled
                        without the option '--with-nplm'.

                        best wishes,
                        Rico



                        On 15.09.2015 15 <tel:15.09.2015%2015>:08,
                        Sanjanashree Palanivel wrote:
                        Dear Rico,

                                    I updated moses and NPLM has
                        been compiled succesfully with moses.
                        However, when I perform decoding I am
                        getting an error.

                            Defined parameters (per moses.ini or
                            switch):
                                config:
                            
/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/moses.ini

                            distortion-limit: 6
                                feature: UnknownWordPenalty
                            WordPenalty PhrasePenalty
                            PhraseDictionaryMemory
                            name=TranslationModel0 num-features=4
                            
path=/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/phrase-table.gz
                            input-factor=0 output-factor=0
                            Distortion KENLM lazyken=0 name=LM0
                            factor=0
                            
path=/home/sanjana/Documents/SMT/LM/Hindi/monolin80k.hi1.bin
                            order=3 NeuralLM factor=0 name=LM1
                            order=3
                            
path=/home/sanjana/Documents/SMT/LM/Hindi/hin_out.txt

                            input-factors: 0
                                mapping: 0 T 0
                                weight: Distortion0= 0.136328 LM0=
                            0.135599 LM1= 0.5 WordPenalty0=
                            -0.488892 PhrasePenalty0= 0.0826147
                            TranslationModel0= 0.0104273 0.0663914
                            0.0254094 0.0543384
                            UnknownWordPenalty0= 1
                            line=UnknownWordPenalty
                            FeatureFunction: UnknownWordPenalty0
                            start: 0 end: 0
                            line=WordPenalty
                            FeatureFunction: WordPenalty0 start: 1
                            end: 1
                            line=PhrasePenalty
                            FeatureFunction: PhrasePenalty0 start:
                            2 end: 2
                            line=PhraseDictionaryMemory
                            name=TranslationModel0 num-features=4
                            
path=/home/sanjana/Documents/SMT/ICON15/Health/BL/Ta_H/model/phrase-table.gz
                            input-factor=0 output-factor=0
                            FeatureFunction: TranslationModel0
                            start: 3 end: 6
                            line=Distortion
                            FeatureFunction: Distortion0 start: 7
                            end: 7
                            line=KENLM lazyken=0 name=LM0 factor=0
                            
path=/home/sanjana/Documents/SMT/LM/Hindi/monolin80k.hi1.bin
                            order=3
                            FeatureFunction: LM0 start: 8 end: 8
                            line=NeuralLM factor=0 name=LM1
                            order=3
                            
path=/home/sanjana/Documents/SMT/LM/Hindi/hin_out.txt
                            Exception: moses/FF/Factory.cpp:349 in
                            void
                            Moses::FeatureRegistry::Construct(const string&,
                            const string&) threw
                            UnknownFeatureException because `i ==
                            registry_.end()'.
                            Feature name NeuralLM is not registered.


                        I added following 2 lines in my moses file

                         NeuralLM factor=0 name=LM1 order=5
                        path=/path/to/nplmmodel
                        LM1= 0.5



                        On Tue, Sep 15, 2015 at 5:06 PM,
                        Sanjanashree Palanivel
                        <[email protected]
                        <mailto:[email protected]>> wrote:

                            Thank you for your earnest response. I
                            will update moses and I will try

                            On Tue, Sep 15, 2015 at 4:22 PM, Rico
                            Sennrich <[email protected]
                            <mailto:[email protected]>> wrote:

                                Hello Sanjanasri,

                                this looks like a version mismatch
                                between Moses and NPLM.
                                Specifically, you're using an
                                older Moses commit that is only
                                compatible with nplm 0.2 (or
                                specifically, Kenneth's fork at
                                https://github.com/kpu/nplm ).

                                If you use the latest Moses
                                version from
                                https://github.com/moses-smt/mosesdecoder
                                , and the latest nplm version from
                                https://github.com/moses-smt/nplm
                                , it should work.

                                best wishes,
                                Rico


                                On 15.09.2015 08
                                <tel:15.09.2015%2008>:24,
                                Sanjanashree Palanivel wrote:

                                Dear all,

                                I tried building language model
                                using NPLM. Llanguage model was
                                build succesfully, but, when I
                                tried to compile NPLM with Moses
                                using "./bjam
                                --with-nplm=path/to/nplm" I am
                                getting an error. I am using
                                boost 1.55. I am attaching the
                                log file for reference.  I dont
                                know where I went wrong. Any help
                                would be appreciated.


-- Thanks and regards,

                                Sanjanasri J.P


                                _______________________________________________
                                Moses-support mailing list
                                [email protected]
                                <mailto:[email protected]>
                                
http://mailman.mit.edu/mailman/listinfo/moses-support


                                _______________________________________________
                                Moses-support mailing list
                                [email protected]
                                <mailto:[email protected]>
                                
http://mailman.mit.edu/mailman/listinfo/moses-support




-- Thanks and regards,

                            Sanjanasri J.P




-- Thanks and regards,

                        Sanjanasri J.P


                        _______________________________________________
                        Moses-support mailing list
                        [email protected]
                        <mailto:[email protected]>
                        http://mailman.mit.edu/mailman/listinfo/moses-support




-- Thanks and regards,

                    Sanjanasri J.P


                    _______________________________________________
                    Moses-support mailing list
                    [email protected] <mailto:[email protected]>
                    http://mailman.mit.edu/mailman/listinfo/moses-support




-- Thanks and regards,

                Sanjanasri J.P




-- Thanks and regards,

            Sanjanasri J.P


            _______________________________________________
            Moses-support mailing list
            [email protected] <mailto:[email protected]>
            http://mailman.mit.edu/mailman/listinfo/moses-support




-- Thanks and regards,

        Sanjanasri J.P




-- Thanks and regards,

    Sanjanasri J.P




--
Thanks and regards,

Sanjanasri J.P

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to