Re: [Moses-support] Tuning and decoding of lattices in the new Moses.

Chris Dyer Fri, 06 Sep 2013 17:22:41 -0700

Yes, there definitely should be a few checks in various places. I've
got a list of recommendations to make lattice decoding a bit easier to
get started with. We'll discuss this next week.
-C


On Fri, Sep 6, 2013 at 5:47 PM, Hieu Hoang <[email protected]> wrote:
> Good to know. I don't think it's obvious that you need that switch for
> lattice input. Maybe there should be a check of some sort in the mert scrip
>
> Sent while bumping into things
>
> On 6 Sep 2013, at 15:42, Yulia Tsvetkov <[email protected]> wrote:
>
> Hi Hieu,
>
> A quick update: I should have used the --no-filter-phrase-table flag,
> otherwise phrase table gets filtered. Thanks a lot for our help!!!!
>
> Yulia
>
>
> On Wed, Sep 4, 2013 at 12:34 PM, Hieu Hoang <[email protected]> wrote:
>>
>> Ok. If you're stil stuck please send me your phrase table and I'll try and
>> debug it
>>
>> Sent while bumping into things
>>
>> On 4 Sep 2013, at 17:07, Yulia Tsvetkov <[email protected]> wrote:
>>
>> phrase table is not empty, it looks normal, here is the snippet:
>>
>> no one ||| aucun de ceux ||| 1 0.00157474 0.0060241 5.00684e-06 ||| 0-0
>> 0-1 1-2 ||| 1 166 1
>> no one ||| ce que personne ||| 0.5 3.7494e-05 0.0060241 5.89199e-06 |||
>> 0-0 1-2 ||| 2 166 1
>> no one ||| il que personne ||| 1 9.31515e-05 0.0060241 1.11289e-05 ||| 0-0
>> 1-2 ||| 1 166 1
>> no one ||| n'est pas le seul ||| 0.0714286 0.0073779 0.0060241 4.54759e-07
>> ||| 0-0 0-1 1-3 ||| 14 166 1
>> no one ||| on ne ||| 0.00444444 0.000152764 0.0060241 0.000497078 ||| 1-0
>> 0-1 ||| 225 166 1
>> no one ||| pas ||| 6.5066e-05 0.000267155 0.0060241 0.294497 ||| 0-0 |||
>> 15369 166 1
>>
>> i don't filter the phrase table...
>>
>> I'll debug more, and Chris was going to look at it too, I will send you an
>> update.
>>
>> Thanks!
>>
>> Yulia
>>
>>
>>
>> On Wed, Sep 4, 2013 at 10:41 AM, Hieu Hoang <[email protected]> wrote:
>>>
>>> hmm, strange. the moses.ini file looks ok. There shouldn't be an issue
>>> with initialisation. Is the phrase-table empty?
>>>
>>> make sure you're not fitlering the phrase table, i don't think the filter
>>> script understand lattices
>>>
>>>
>>>
>>>
>>> On 4 September 2013 15:10, Yulia Tsvetkov <[email protected]>
>>> wrote:
>>>>
>>>> Hi Hieu,
>>>>
>>>>> did you manage to get moses working with lattices again? it would be
>>>>> nice to get some feedback
>>>>
>>>> Sorry for not sending feedback earlier -- I was just trying to debug by
>>>> myself before I send feedback or ask next question...
>>>>
>>>> I was able to run a pipeline with the new settings, thanks a lot for the
>>>> detailed answer!
>>>>
>>>> There is still a problem (with feature initialization?), here is the
>>>> first lattice translation, looks like all input words are treated as OOVs
>>>> (and they are not), and then MERT gets killed:
>>>>
>>>> BEST TRANSLATION: no|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK
>>>> the|UNK|UNK|UNK intense|UNK|UNK|UNK closures|UNK|UNK|UNK of|UNK|UNK|UNK
>>>> travel|UNK|UNK|UNK and|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK
>>>> the|UNK|UNK|UNK delights|UNK|UNK|UNK of|UNK|UNK|UNK 
>>>> ethnographic|UNK|UNK|UNK
>>>> research|UNK|UNK|UNK is|UNK|UNK|UNK the|UNK|UNK|UNK opportunity|UNK|UNK|UNK
>>>> to|UNK|UNK|UNK live|UNK|UNK|UNK amongst|UNK|UNK|UNK those|UNK|UNK|UNK
>>>> who|UNK|UNK|UNK have|UNK|UNK|UNK not|UNK|UNK|UNK forgotten|UNK|UNK|UNK
>>>> the|UNK|UNK|UNK old|UNK|UNK|UNK ways|UNK|UNK|UNK to|UNK|UNK|UNK
>>>> still|UNK|UNK|UNK feel|UNK|UNK|UNK their|UNK|UNK|UNK pass|UNK|UNK|UNK
>>>> in|UNK|UNK|UNK the|UNK|UNK|UNK when|UNK|UNK|UNK touch|UNK|UNK|UNK
>>>> and|UNK|UNK|UNK stones|UNK|UNK|UNK caused|UNK|UNK|UNK by|UNK|UNK|UNK
>>>> rain|UNK|UNK|UNK tasted|UNK|UNK|UNK leaves|UNK|UNK|UNK of|UNK|UNK|UNK
>>>> the|UNK|UNK|UNK bitter|UNK|UNK|UNK plants|UNK|UNK|UNK
>>>> [1111111111111111111111111111111111111111111111111111111111111]
>>>> [total=-6405.459]
>>>> core=(-6100.000,-50.000,61.000,0.000,0.000,0.000,0.000,-8.000,-1952.355,0.000)
>>>> Line 0: Translation took 0.000 seconds total
>>>> Translating line 1  in thread id 47061808453376
>>>> sh: line 1:  7333 Killed
>>>> /home/ytsvetko/tools/mosesdecoder/bin/moses -config filtered/moses.ini
>>>> -inputtype 2 -weight-overwrite 'InputFeature0= 0.066667 PhrasePenalty0=
>>>> 0.066667 WordPenalty0= -0.333333 TranslationModel0= 0.066667 0.066667
>>>> 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667' -n-best-list
>>>> run1.best100.out 100 -input-file
>>>> /share/workhorse4/ytsvetko/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/tuning/corpus.en
>>>> > run1.out
>>>> Exit code: 137
>>>> The decoder died. CONFIG WAS -weight-overwrite 'InputFeature0= 0.066667
>>>> PhrasePenalty0= 0.066667 WordPenalty0= -0.333333 TranslationModel0= 
>>>> 0.066667
>>>> 0.066667 0.066667 0.066667 Distortion0= 0.100000 LM0= 0.166667'
>>>>
>>>> I attach my config file, and here is the exact command that I am
>>>> executing:
>>>>
>>>> mert-moses.pl ./tuning/corpus.en ./tuning/corpus.fr
>>>> /home/ytsvetko/tools/mosesdecoder/bin/moses ./moses.ini --working-dir
>>>> ./tuning --mertdir /home/ytsvetko/tools/mosesdecoder/mert --inputtype 2
>>>>
>>>>
>>>> Thanks a lot for your help!
>>>> Yulia
>>>>
>>>>
>>>>>
>>>>>
>>>>> On 2 September 2013 17:03, Hieu Hoang <[email protected]> wrote:
>>>>>>
>>>>>> Hi Yulia
>>>>>>
>>>>>>
>>>>>> On 1 September 2013 22:46, Yulia Tsvetkov <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Dear Moses developers,
>>>>>>>
>>>>>>> I am trying to use the a new version of Moses, seems like things have
>>>>>>> changed quite a bit and I have hard time finding an up-to-date
>>>>>>> documentation. For debugging I used very small train/tune/test corpora 
>>>>>>> (10
>>>>>>> lines each).
>>>>>>>
>>>>>>> First thing is running the following command produces a phrase table
>>>>>>> with only 4 features:
>>>>>>> train-model.perl --root-dir $root_dir --corpus $root_dir/$corpus_name
>>>>>>> --f $src_lng --e $trg_lng --alignment grow-diag-final --lm 0:3:$LM
>>>>>>> -external-bin-dir $external_bin_dir`;
>>>>>>>
>>>>>>> Here is a snippet from a produced moses.iniPhraseDictionaryMemory
>>>>>>> name=TranslationModel0 table-limit=20 num-features=4
>>>>>>> path=/usr1/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/model/phrase-table.gz
>>>>>>> input-factor=0 output-factor=0
>>>>>>
>>>>>>
>>>>>> Yes, the phrase-table now has 4 scores, instead of 5. The 5th score
>>>>>> was a constant 2.718. This has now moved into it's own feature function,
>>>>>> PhrasePenalty.
>>>>>>
>>>>>> it save 3% of disk space, and i think is better for research. eg.
>>>>>> create better, non-constant phrase penalty feature functions, if we have 
>>>>>> 2
>>>>>> phrase tables do we need just 1 phrase penalty? etc.
>>>>>>
>>>>>>>
>>>>>>> Second, I am trying to run tuning and decoding of lattices in plf
>>>>>>> format.
>>>>>>> Can you point me to example commands and moses.ini for running mert
>>>>>>> and decoding lattices with the new Moses?
>>>>>>
>>>>>> an example ini file for lattices can be seen here
>>>>>>
>>>>>> https://github.com/moses-smt/moses-regression-tests/blob/master/tests/phrase.lattice-surface/moses.ini
>>>>>>
>>>>>> Mert should run like it has always did. However, if you upgrade the
>>>>>> decoder, you should use the upgraded mert script too.
>>>>>>
>>>>>> Decoding with lattice is exactly the same as for a sentence, except 2
>>>>>> things
>>>>>>    1. inputtype=2. This can be on the command line of in the ini file,
>>>>>> eg.
>>>>>>            ./moses -inputtype 2
>>>>>>
>>>>>>        or
>>>>>>             [inputtype]
>>>>>>             2
>>>>>>
>>>>>>    2. You should use the InputFeature feature function. This is the
>>>>>> score of the path through the lattice. You can see the InputFeature in 
>>>>>> the
>>>>>> ini file:
>>>>>>       [feature]
>>>>>>       ....
>>>>>>       InputFeature num-features=1 num-input-features=1
>>>>>> real-word-count=0
>>>>>>
>>>>>>       [weight]
>>>>>>       ...
>>>>>>       InputFeature0 = 1
>>>>>>
>>>>>>    Before the refactoring, this was hacked into as an extra feature in
>>>>>> the phrase-table
>>>>>>>
>>>>>>>
>>>>>>> So far I tried training and tuning on text files and decoding on
>>>>>>> lattices because I could not figure out the right settings for tuning.
>>>>>>> According to some old documentation I am supposed to convert the
>>>>>>> phrase table to a binary format. Is it still needed?
>>>>>>
>>>>>> You no longer need to convert it to binary format. It's good to
>>>>>> convert to binary format to save memory, but it is not required. Lattice
>>>>>> decoding works with all phrase-table implmentations now
>>>>>>>
>>>>>>>
>>>>>>> When I ran it with the following command:
>>>>>>> moses -inputtype 2 -weight-i 0.62 -weight-l 12.5 -f
>>>>>>> $tune_dir/moses.ini < $eval_dir/69.plf > $eval_dir/69.plf.out
>>>>>>> I got an error:
>>>>>>> Don't mix old and new ini file format
>>>>>>> What is the new equivalent of weight-i and weight-l?
>>>>>>
>>>>>>
>>>>>>    -weight-i 0.62
>>>>>> now becomes
>>>>>>    -weight-overwrite 'InputFeature0= 0.62'
>>>>>>
>>>>>>   -weight-l 12.5
>>>>>> now becomes
>>>>>>    -weight-overwrite 'LM0= 12.5'
>>>>>>
>>>>>> The updated mert script should be doing this anyway.
>>>>>>>
>>>>>>>
>>>>>>> Without those parameters I get a Segmentation Fault with both a .gz
>>>>>>> and a binary phrase table.
>>>>>>
>>>>>>
>>>>>> if you're still having problems, give me your ini file and exact
>>>>>> command you're executing and i'll try and debug it
>>>>>>>
>>>>>>>
>>>>>>> Could you help me figuring out the right settings?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected]
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Hieu Hoang
>>>>>> Research Associate
>>>>>> University of Edinburgh
>>>>>> http://www.hoang.co.uk/hieu
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Hieu Hoang
>>>>> Research Associate
>>>>> University of Edinburgh
>>>>> http://www.hoang.co.uk/hieu
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Hieu Hoang
>>> Research Associate
>>> University of Edinburgh
>>> http://www.hoang.co.uk/hieu
>>>
>>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tuning and decoding of lattices in the new Moses.

Reply via email to