Yes, there definitely should be a few checks in various places. I've got a list of recommendations to make lattice decoding a bit easier to get started with. We'll discuss this next week. -C
On Fri, Sep 6, 2013 at 5:47 PM, Hieu Hoang <[email protected]> wrote: > Good to know. I don't think it's obvious that you need that switch for > lattice input. Maybe there should be a check of some sort in the mert scrip > > Sent while bumping into things > > On 6 Sep 2013, at 15:42, Yulia Tsvetkov <[email protected]> wrote: > > Hi Hieu, > > A quick update: I should have used the --no-filter-phrase-table flag, > otherwise phrase table gets filtered. Thanks a lot for our help!!!! > > Yulia > > > On Wed, Sep 4, 2013 at 12:34 PM, Hieu Hoang <[email protected]> wrote: >> >> Ok. If you're stil stuck please send me your phrase table and I'll try and >> debug it >> >> Sent while bumping into things >> >> On 4 Sep 2013, at 17:07, Yulia Tsvetkov <[email protected]> wrote: >> >> phrase table is not empty, it looks normal, here is the snippet: >> >> no one ||| aucun de ceux ||| 1 0.00157474 0.0060241 5.00684e-06 ||| 0-0 >> 0-1 1-2 ||| 1 166 1 >> no one ||| ce que personne ||| 0.5 3.7494e-05 0.0060241 5.89199e-06 ||| >> 0-0 1-2 ||| 2 166 1 >> no one ||| il que personne ||| 1 9.31515e-05 0.0060241 1.11289e-05 ||| 0-0 >> 1-2 ||| 1 166 1 >> no one ||| n'est pas le seul ||| 0.0714286 0.0073779 0.0060241 4.54759e-07 >> ||| 0-0 0-1 1-3 ||| 14 166 1 >> no one ||| on ne ||| 0.00444444 0.000152764 0.0060241 0.000497078 ||| 1-0 >> 0-1 ||| 225 166 1 >> no one ||| pas ||| 6.5066e-05 0.000267155 0.0060241 0.294497 ||| 0-0 ||| >> 15369 166 1 >> >> i don't filter the phrase table... >> >> I'll debug more, and Chris was going to look at it too, I will send you an >> update. >> >> Thanks! >> >> Yulia >> >> >> >> On Wed, Sep 4, 2013 at 10:41 AM, Hieu Hoang <[email protected]> wrote: >>> >>> hmm, strange. the moses.ini file looks ok. There shouldn't be an issue >>> with initialisation. Is the phrase-table empty? >>> >>> make sure you're not fitlering the phrase table, i don't think the filter >>> script understand lattices >>> >>> >>> >>> >>> On 4 September 2013 15:10, Yulia Tsvetkov <[email protected]> >>> wrote: >>>> >>>> Hi Hieu, >>>> >>>>> did you manage to get moses working with lattices again? it would be >>>>> nice to get some feedback >>>> >>>> Sorry for not sending feedback earlier -- I was just trying to debug by >>>> myself before I send feedback or ask next question... >>>> >>>> I was able to run a pipeline with the new settings, thanks a lot for the >>>> detailed answer! >>>> >>>> There is still a problem (with feature initialization?), here is the >>>> first lattice translation, looks like all input words are treated as OOVs >>>> (and they are not), and then MERT gets killed: >>>> >>>> BEST TRANSLATION: no|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK >>>> the|UNK|UNK|UNK intense|UNK|UNK|UNK closures|UNK|UNK|UNK of|UNK|UNK|UNK >>>> travel|UNK|UNK|UNK and|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK >>>> the|UNK|UNK|UNK delights|UNK|UNK|UNK of|UNK|UNK|UNK >>>> ethnographic|UNK|UNK|UNK >>>> research|UNK|UNK|UNK is|UNK|UNK|UNK the|UNK|UNK|UNK opportunity|UNK|UNK|UNK >>>> to|UNK|UNK|UNK live|UNK|UNK|UNK amongst|UNK|UNK|UNK those|UNK|UNK|UNK >>>> who|UNK|UNK|UNK have|UNK|UNK|UNK not|UNK|UNK|UNK forgotten|UNK|UNK|UNK >>>> the|UNK|UNK|UNK old|UNK|UNK|UNK ways|UNK|UNK|UNK to|UNK|UNK|UNK >>>> still|UNK|UNK|UNK feel|UNK|UNK|UNK their|UNK|UNK|UNK pass|UNK|UNK|UNK >>>> in|UNK|UNK|UNK the|UNK|UNK|UNK when|UNK|UNK|UNK touch|UNK|UNK|UNK >>>> and|UNK|UNK|UNK stones|UNK|UNK|UNK caused|UNK|UNK|UNK by|UNK|UNK|UNK >>>> rain|UNK|UNK|UNK tasted|UNK|UNK|UNK leaves|UNK|UNK|UNK of|UNK|UNK|UNK >>>> the|UNK|UNK|UNK bitter|UNK|UNK|UNK plants|UNK|UNK|UNK >>>> [1111111111111111111111111111111111111111111111111111111111111] >>>> [total=-6405.459] >>>> core=(-6100.000,-50.000,61.000,0.000,0.000,0.000,0.000,-8.000,-1952.355,0.000) >>>> Line 0: Translation took 0.000 seconds total >>>> Translating line 1 in thread id 47061808453376 >>>> sh: line 1: 7333 Killed >>>> /home/ytsvetko/tools/mosesdecoder/bin/moses -config filtered/moses.ini >>>> -inputtype 2 -weight-overwrite 'InputFeature0= 0.066667 PhrasePenalty0= >>>> 0.066667 WordPenalty0= -0.333333 TranslationModel0= 0.066667 0.066667 >>>> 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667' -n-best-list >>>> run1.best100.out 100 -input-file >>>> /share/workhorse4/ytsvetko/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/tuning/corpus.en >>>> > run1.out >>>> Exit code: 137 >>>> The decoder died. CONFIG WAS -weight-overwrite 'InputFeature0= 0.066667 >>>> PhrasePenalty0= 0.066667 WordPenalty0= -0.333333 TranslationModel0= >>>> 0.066667 >>>> 0.066667 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667' >>>> >>>> I attach my config file, and here is the exact command that I am >>>> executing: >>>> >>>> mert-moses.pl ./tuning/corpus.en ./tuning/corpus.fr >>>> /home/ytsvetko/tools/mosesdecoder/bin/moses ./moses.ini --working-dir >>>> ./tuning --mertdir /home/ytsvetko/tools/mosesdecoder/mert --inputtype 2 >>>> >>>> >>>> Thanks a lot for your help! >>>> Yulia >>>> >>>> >>>>> >>>>> >>>>> On 2 September 2013 17:03, Hieu Hoang <[email protected]> wrote: >>>>>> >>>>>> Hi Yulia >>>>>> >>>>>> >>>>>> On 1 September 2013 22:46, Yulia Tsvetkov <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Dear Moses developers, >>>>>>> >>>>>>> I am trying to use the a new version of Moses, seems like things have >>>>>>> changed quite a bit and I have hard time finding an up-to-date >>>>>>> documentation. For debugging I used very small train/tune/test corpora >>>>>>> (10 >>>>>>> lines each). >>>>>>> >>>>>>> First thing is running the following command produces a phrase table >>>>>>> with only 4 features: >>>>>>> train-model.perl --root-dir $root_dir --corpus $root_dir/$corpus_name >>>>>>> --f $src_lng --e $trg_lng --alignment grow-diag-final --lm 0:3:$LM >>>>>>> -external-bin-dir $external_bin_dir`; >>>>>>> >>>>>>> Here is a snippet from a produced moses.iniPhraseDictionaryMemory >>>>>>> name=TranslationModel0 table-limit=20 num-features=4 >>>>>>> path=/usr1/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/model/phrase-table.gz >>>>>>> input-factor=0 output-factor=0 >>>>>> >>>>>> >>>>>> Yes, the phrase-table now has 4 scores, instead of 5. The 5th score >>>>>> was a constant 2.718. This has now moved into it's own feature function, >>>>>> PhrasePenalty. >>>>>> >>>>>> it save 3% of disk space, and i think is better for research. eg. >>>>>> create better, non-constant phrase penalty feature functions, if we have >>>>>> 2 >>>>>> phrase tables do we need just 1 phrase penalty? etc. >>>>>> >>>>>>> >>>>>>> Second, I am trying to run tuning and decoding of lattices in plf >>>>>>> format. >>>>>>> Can you point me to example commands and moses.ini for running mert >>>>>>> and decoding lattices with the new Moses? >>>>>> >>>>>> an example ini file for lattices can be seen here >>>>>> >>>>>> https://github.com/moses-smt/moses-regression-tests/blob/master/tests/phrase.lattice-surface/moses.ini >>>>>> >>>>>> Mert should run like it has always did. However, if you upgrade the >>>>>> decoder, you should use the upgraded mert script too. >>>>>> >>>>>> Decoding with lattice is exactly the same as for a sentence, except 2 >>>>>> things >>>>>> 1. inputtype=2. This can be on the command line of in the ini file, >>>>>> eg. >>>>>> ./moses -inputtype 2 >>>>>> >>>>>> or >>>>>> [inputtype] >>>>>> 2 >>>>>> >>>>>> 2. You should use the InputFeature feature function. This is the >>>>>> score of the path through the lattice. You can see the InputFeature in >>>>>> the >>>>>> ini file: >>>>>> [feature] >>>>>> .... >>>>>> InputFeature num-features=1 num-input-features=1 >>>>>> real-word-count=0 >>>>>> >>>>>> [weight] >>>>>> ... >>>>>> InputFeature0 = 1 >>>>>> >>>>>> Before the refactoring, this was hacked into as an extra feature in >>>>>> the phrase-table >>>>>>> >>>>>>> >>>>>>> So far I tried training and tuning on text files and decoding on >>>>>>> lattices because I could not figure out the right settings for tuning. >>>>>>> According to some old documentation I am supposed to convert the >>>>>>> phrase table to a binary format. Is it still needed? >>>>>> >>>>>> You no longer need to convert it to binary format. It's good to >>>>>> convert to binary format to save memory, but it is not required. Lattice >>>>>> decoding works with all phrase-table implmentations now >>>>>>> >>>>>>> >>>>>>> When I ran it with the following command: >>>>>>> moses -inputtype 2 -weight-i 0.62 -weight-l 12.5 -f >>>>>>> $tune_dir/moses.ini < $eval_dir/69.plf > $eval_dir/69.plf.out >>>>>>> I got an error: >>>>>>> Don't mix old and new ini file format >>>>>>> What is the new equivalent of weight-i and weight-l? >>>>>> >>>>>> >>>>>> -weight-i 0.62 >>>>>> now becomes >>>>>> -weight-overwrite 'InputFeature0= 0.62' >>>>>> >>>>>> -weight-l 12.5 >>>>>> now becomes >>>>>> -weight-overwrite 'LM0= 12.5' >>>>>> >>>>>> The updated mert script should be doing this anyway. >>>>>>> >>>>>>> >>>>>>> Without those parameters I get a Segmentation Fault with both a .gz >>>>>>> and a binary phrase table. >>>>>> >>>>>> >>>>>> if you're still having problems, give me your ini file and exact >>>>>> command you're executing and i'll try and debug it >>>>>>> >>>>>>> >>>>>>> Could you help me figuring out the right settings? >>>>>>> >>>>>>> Thanks in advance. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Hieu Hoang >>>>>> Research Associate >>>>>> University of Edinburgh >>>>>> http://www.hoang.co.uk/hieu >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Hieu Hoang >>>>> Research Associate >>>>> University of Edinburgh >>>>> http://www.hoang.co.uk/hieu >>>>> >>>> >>> >>> >>> >>> -- >>> Hieu Hoang >>> Research Associate >>> University of Edinburgh >>> http://www.hoang.co.uk/hieu >>> >> > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
