Good to know. I don't think it's obvious that you need that switch for lattice input. Maybe there should be a check of some sort in the mert scrip
Sent while bumping into things On 6 Sep 2013, at 15:42, Yulia Tsvetkov <[email protected]> wrote: > Hi Hieu, > > A quick update: I should have used the --no-filter-phrase-table flag, > otherwise phrase table gets filtered. Thanks a lot for our help!!!! > > Yulia > > > On Wed, Sep 4, 2013 at 12:34 PM, Hieu Hoang <[email protected]> wrote: >> Ok. If you're stil stuck please send me your phrase table and I'll try and >> debug it >> >> Sent while bumping into things >> >> On 4 Sep 2013, at 17:07, Yulia Tsvetkov <[email protected]> wrote: >> >>> phrase table is not empty, it looks normal, here is the snippet: >>> >>> no one ||| aucun de ceux ||| 1 0.00157474 0.0060241 5.00684e-06 ||| 0-0 0-1 >>> 1-2 ||| 1 166 1 >>> no one ||| ce que personne ||| 0.5 3.7494e-05 0.0060241 5.89199e-06 ||| 0-0 >>> 1-2 ||| 2 166 1 >>> no one ||| il que personne ||| 1 9.31515e-05 0.0060241 1.11289e-05 ||| 0-0 >>> 1-2 ||| 1 166 1 >>> no one ||| n'est pas le seul ||| 0.0714286 0.0073779 0.0060241 4.54759e-07 >>> ||| 0-0 0-1 1-3 ||| 14 166 1 >>> no one ||| on ne ||| 0.00444444 0.000152764 0.0060241 0.000497078 ||| 1-0 >>> 0-1 ||| 225 166 1 >>> no one ||| pas ||| 6.5066e-05 0.000267155 0.0060241 0.294497 ||| 0-0 ||| >>> 15369 166 1 >>> >>> i don't filter the phrase table... >>> >>> I'll debug more, and Chris was going to look at it too, I will send you an >>> update. >>> >>> Thanks! >>> >>> Yulia >>> >>> >>> >>> On Wed, Sep 4, 2013 at 10:41 AM, Hieu Hoang <[email protected]> wrote: >>>> hmm, strange. the moses.ini file looks ok. There shouldn't be an issue >>>> with initialisation. Is the phrase-table empty? >>>> >>>> make sure you're not fitlering the phrase table, i don't think the filter >>>> script understand lattices >>>> >>>> >>>> >>>> >>>> On 4 September 2013 15:10, Yulia Tsvetkov <[email protected]> wrote: >>>>> Hi Hieu, >>>>> >>>>>> did you manage to get moses working with lattices again? it would be >>>>>> nice to get some feedback >>>>> >>>>> Sorry for not sending feedback earlier -- I was just trying to debug by >>>>> myself before I send feedback or ask next question... >>>>> >>>>> I was able to run a pipeline with the new settings, thanks a lot for the >>>>> detailed answer! >>>>> >>>>> There is still a problem (with feature initialization?), here is the >>>>> first lattice translation, looks like all input words are treated as OOVs >>>>> (and they are not), and then MERT gets killed: >>>>> >>>>> BEST TRANSLATION: no|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK >>>>> the|UNK|UNK|UNK intense|UNK|UNK|UNK closures|UNK|UNK|UNK of|UNK|UNK|UNK >>>>> travel|UNK|UNK|UNK and|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK >>>>> the|UNK|UNK|UNK delights|UNK|UNK|UNK of|UNK|UNK|UNK >>>>> ethnographic|UNK|UNK|UNK research|UNK|UNK|UNK is|UNK|UNK|UNK >>>>> the|UNK|UNK|UNK opportunity|UNK|UNK|UNK to|UNK|UNK|UNK live|UNK|UNK|UNK >>>>> amongst|UNK|UNK|UNK those|UNK|UNK|UNK who|UNK|UNK|UNK have|UNK|UNK|UNK >>>>> not|UNK|UNK|UNK forgotten|UNK|UNK|UNK the|UNK|UNK|UNK old|UNK|UNK|UNK >>>>> ways|UNK|UNK|UNK to|UNK|UNK|UNK still|UNK|UNK|UNK feel|UNK|UNK|UNK >>>>> their|UNK|UNK|UNK pass|UNK|UNK|UNK in|UNK|UNK|UNK the|UNK|UNK|UNK >>>>> when|UNK|UNK|UNK touch|UNK|UNK|UNK and|UNK|UNK|UNK stones|UNK|UNK|UNK >>>>> caused|UNK|UNK|UNK by|UNK|UNK|UNK rain|UNK|UNK|UNK tasted|UNK|UNK|UNK >>>>> leaves|UNK|UNK|UNK of|UNK|UNK|UNK the|UNK|UNK|UNK bitter|UNK|UNK|UNK >>>>> plants|UNK|UNK|UNK >>>>> [1111111111111111111111111111111111111111111111111111111111111] >>>>> [total=-6405.459] >>>>> core=(-6100.000,-50.000,61.000,0.000,0.000,0.000,0.000,-8.000,-1952.355,0.000) >>>>> >>>>> Line 0: Translation took 0.000 seconds total >>>>> Translating line 1 in thread id 47061808453376 >>>>> sh: line 1: 7333 Killed >>>>> /home/ytsvetko/tools/mosesdecoder/bin/moses -config filtered/moses.ini >>>>> -inputtype 2 -weight-overwrite 'InputFeature0= 0.066667 PhrasePenalty0= >>>>> 0.066667 WordPenalty0= -0.333333 TranslationModel0= 0.066667 0.066667 >>>>> 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667' -n-best-list >>>>> run1.best100.out 100 -input-file >>>>> /share/workhorse4/ytsvetko/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/tuning/corpus.en >>>>> > run1.out >>>>> Exit code: 137 >>>>> The decoder died. CONFIG WAS -weight-overwrite 'InputFeature0= 0.066667 >>>>> PhrasePenalty0= 0.066667 WordPenalty0= -0.333333 TranslationModel0= >>>>> 0.066667 0.066667 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667' >>>>> >>>>> I attach my config file, and here is the exact command that I am >>>>> executing: >>>>> >>>>> mert-moses.pl ./tuning/corpus.en ./tuning/corpus.fr >>>>> /home/ytsvetko/tools/mosesdecoder/bin/moses ./moses.ini --working-dir >>>>> ./tuning --mertdir /home/ytsvetko/tools/mosesdecoder/mert --inputtype 2 >>>>> >>>>> >>>>> Thanks a lot for your help! >>>>> Yulia >>>>> >>>>> >>>>>> >>>>>> >>>>>> On 2 September 2013 17:03, Hieu Hoang <[email protected]> wrote: >>>>>>> Hi Yulia >>>>>>> >>>>>>> >>>>>>> On 1 September 2013 22:46, Yulia Tsvetkov <[email protected]> >>>>>>> wrote: >>>>>>>> Dear Moses developers, >>>>>>>> >>>>>>>> I am trying to use the a new version of Moses, seems like things have >>>>>>>> changed quite a bit and I have hard time finding an up-to-date >>>>>>>> documentation. For debugging I used very small train/tune/test corpora >>>>>>>> (10 lines each). >>>>>>>> >>>>>>>> First thing is running the following command produces a phrase table >>>>>>>> with only 4 features: >>>>>>>> train-model.perl --root-dir $root_dir --corpus $root_dir/$corpus_name >>>>>>>> --f $src_lng --e $trg_lng --alignment grow-diag-final --lm 0:3:$LM >>>>>>>> -external-bin-dir $external_bin_dir`; >>>>>>>> >>>>>>>> Here is a snippet from a produced moses.iniPhraseDictionaryMemory >>>>>>>> name=TranslationModel0 table-limit=20 num-features=4 >>>>>>>> path=/usr1/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/model/phrase-table.gz >>>>>>>> input-factor=0 output-factor=0 >>>>>>> >>>>>>> Yes, the phrase-table now has 4 scores, instead of 5. The 5th score was >>>>>>> a constant 2.718. This has now moved into it's own feature function, >>>>>>> PhrasePenalty. >>>>>>> >>>>>>> it save 3% of disk space, and i think is better for research. eg. >>>>>>> create better, non-constant phrase penalty feature functions, if we >>>>>>> have 2 phrase tables do we need just 1 phrase penalty? etc. >>>>>>> >>>>>>>> >>>>>>>> Second, I am trying to run tuning and decoding of lattices in plf >>>>>>>> format. >>>>>>>> Can you point me to example commands and moses.ini for running mert >>>>>>>> and decoding lattices with the new Moses? >>>>>>> >>>>>>> an example ini file for lattices can be seen here >>>>>>> >>>>>>> https://github.com/moses-smt/moses-regression-tests/blob/master/tests/phrase.lattice-surface/moses.ini >>>>>>> >>>>>>> Mert should run like it has always did. However, if you upgrade the >>>>>>> decoder, you should use the upgraded mert script too. >>>>>>> >>>>>>> Decoding with lattice is exactly the same as for a sentence, except 2 >>>>>>> things >>>>>>> 1. inputtype=2. This can be on the command line of in the ini file, >>>>>>> eg. >>>>>>> ./moses -inputtype 2 >>>>>>> >>>>>>> or >>>>>>> [inputtype] >>>>>>> 2 >>>>>>> >>>>>>> 2. You should use the InputFeature feature function. This is the >>>>>>> score of the path through the lattice. You can see the InputFeature in >>>>>>> the ini file: >>>>>>> [feature] >>>>>>> .... >>>>>>> InputFeature num-features=1 num-input-features=1 real-word-count=0 >>>>>>> >>>>>>> [weight] >>>>>>> ... >>>>>>> InputFeature0 = 1 >>>>>>> >>>>>>> Before the refactoring, this was hacked into as an extra feature in >>>>>>> the phrase-table >>>>>>>> >>>>>>>> So far I tried training and tuning on text files and decoding on >>>>>>>> lattices because I could not figure out the right settings for tuning. >>>>>>>> According to some old documentation I am supposed to convert the >>>>>>>> phrase table to a binary format. Is it still needed? >>>>>>> >>>>>>> You no longer need to convert it to binary format. It's good to convert >>>>>>> to binary format to save memory, but it is not required. Lattice >>>>>>> decoding works with all phrase-table implmentations now >>>>>>>> >>>>>>>> When I ran it with the following command: >>>>>>>> moses -inputtype 2 -weight-i 0.62 -weight-l 12.5 -f >>>>>>>> $tune_dir/moses.ini < $eval_dir/69.plf > $eval_dir/69.plf.out >>>>>>>> I got an error: >>>>>>>> Don't mix old and new ini file format >>>>>>>> What is the new equivalent of weight-i and weight-l? >>>>>>> >>>>>>> -weight-i 0.62 >>>>>>> now becomes >>>>>>> -weight-overwrite 'InputFeature0= 0.62' >>>>>>> >>>>>>> -weight-l 12.5 >>>>>>> now becomes >>>>>>> -weight-overwrite 'LM0= 12.5' >>>>>>> >>>>>>> The updated mert script should be doing this anyway. >>>>>>>> >>>>>>>> Without those parameters I get a Segmentation Fault with both a .gz >>>>>>>> and a binary phrase table. >>>>>>> >>>>>>> if you're still having problems, give me your ini file and exact >>>>>>> command you're executing and i'll try and debug it >>>>>>>> >>>>>>>> Could you help me figuring out the right settings? >>>>>>>> >>>>>>>> Thanks in advance. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Hieu Hoang >>>>>>> Research Associate >>>>>>> University of Edinburgh >>>>>>> http://www.hoang.co.uk/hieu >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hieu Hoang >>>>>> Research Associate >>>>>> University of Edinburgh >>>>>> http://www.hoang.co.uk/hieu >>>> >>>> >>>> >>>> -- >>>> Hieu Hoang >>>> Research Associate >>>> University of Edinburgh >>>> http://www.hoang.co.uk/hieu >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
