hiya

Hieu Hoang
http://www.hoang.co.uk/hieu

On 13 October 2016 at 15:08, Vito Mandorino <
vito.mandor...@linguacustodia.com> wrote:

> We haven't checked the probingpt + minlexr speedup yet, however we have
> found some further differences in the output with respect to the standard
> Moses decoder.
>
> It happens sometimes that the order of replacement of placeholders with
> actual numbers is not the good one. For instance :
>
> moses2 output: as of december 2012 , 31
> moses output: as of december 31 , 2012
>
> moses2 output: à jour au 2013 février 15
> moses output: à jour au 15 février 2013
>
> Is this the expected behavior?
>
no, they should work the same way. Model files and example input would be
good so I can replicate it

>
> Another minor difference is the handling of the carriage return character
> ("\r") . It seems to be deleted by standard Moses and converted into
> newline by Moses2.
>
there's no explicit handling of this in either moses or moses2. Whatever
happens is not guaranteed to happen. You're better off preprocessing to
remove \r and other non-printing characters

>
> Best,
> Vito
>
> 2016-10-07 17:24 GMT+02:00 Hieu Hoang <hieuho...@gmail.com>:
>
>> yep, it should give you a big speedup compared to probingpt + minlexr
>> model
>>
>> Hieu Hoang
>> http://www.hoang.co.uk/hieu
>>
>> On 7 October 2016 at 16:21, Vito Mandorino <vito.mandorino@linguacustodia
>> .com> wrote:
>>
>>> Yes I modified the line in the moses.ini . My comparison was with
>>> respect to probingPT + minlexr reordering model (rather than .gz reordering
>>> model)
>>>
>>> 2016-10-07 16:25 GMT+02:00 Hieu Hoang <hieuho...@gmail.com>:
>>>
>>>> weird. it should be a massive speedup (~500%). You have to change the
>>>> moses.ini file slightly
>>>>
>>>>   [feature]
>>>>   LexicalReordering … path=reordering-table.msd-bidi
>>>> rectional-fe.0.5.0-0.gz
>>>> to
>>>>   [feature]
>>>>   LexicalReordering … property-index=0
>>>>
>>>>
>>>> Hieu Hoang
>>>> http://www.hoang.co.uk/hieu
>>>>
>>>> On 7 October 2016 at 15:02, Vito Mandorino <
>>>> vito.mandor...@linguacustodia.com> wrote:
>>>>
>>>>> Yes, that worked for me as well, thank you. There is a little
>>>>> improvement in speed but not that much actually (about 5% faster using 30
>>>>> threads).
>>>>>
>>>>> 2016-10-04 11:44 GMT+02:00 Hieu Hoang <hieuho...@gmail.com>:
>>>>>
>>>>>> yes - the script expects the files to be gzipped.
>>>>>> It runs ok for me. I executed this:
>>>>>>
>>>>>>     MOSES_DIR=~/workspace/github/mosesdecoder.perf
>>>>>>
>>>>>>     $MOSES_DIR/scripts/generic/binarize4moses2.perl
>>>>>> --phrase-table=phrase-table.gz 
>>>>>> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz
>>>>>> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6
>>>>>>
>>>>>> Got this:
>>>>>>
>>>>>>     Executing: gzip -dc phrase-table.gz |
>>>>>> /home/hieu/workspace/github/mosesdecoder.perf/scripts/generi
>>>>>> c/../../contrib/sigtest-filter/filter-pt -n 0 | gzip -c >
>>>>>> ./tmp.14373/pt.gz
>>>>>>     ...
>>>>>>     Reading phrase table finished, writing remaining files to disk.
>>>>>>
>>>>>> $ ll integrated_phrase-reordering/
>>>>>> total 24688
>>>>>> drwxrwxr-x 2 hieu hieu     4096 Oct  4 10:38 ./
>>>>>> drwxrwxr-x 5 hieu hieu     4096 Oct  4 10:42 ../
>>>>>> -rw-rw-r-- 1 hieu hieu   917861 Oct  4 10:42 Alignments.dat
>>>>>> -rw-rw-r-- 1 hieu hieu  2267885 Oct  4 10:42 cache
>>>>>> -rw-rw-r-- 1 hieu hieu       76 Oct  4 10:42 config
>>>>>> -rw-rw-r-- 1 hieu hieu  3146720 Oct  4 10:42 probing_hash.dat
>>>>>> -rw-rw-r-- 1 hieu hieu   333856 Oct  4 10:42 source_vocabids
>>>>>> -rw-rw-r-- 1 hieu hieu 18429920 Oct  4 10:42 TargetColl.dat
>>>>>> -rw-rw-r-- 1 hieu hieu   121401 Oct  4 10:42 TargetVocab.dat
>>>>>>
>>>>>>
>>>>>> On 04/10/2016 09:06, Vito Mandorino wrote:
>>>>>>
>>>>>> The command was
>>>>>>
>>>>>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl
>>>>>> --phrase-table=/home/vito/phrase-table.sorted
>>>>>> --lex-ro=/home/vito/reordering-table.sorted
>>>>>> --output-dir=/home/vito/integrated_phrase-reordering/
>>>>>> --num-lex-scores=6
>>>>>>
>>>>>> The tables in the command are sorted with LC_ALL . I attach them in
>>>>>> .gz format. Should one use the .gz format also in the command above?
>>>>>>
>>>>>> Vito
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>>>
>>>>>
>>>>> [image: Description : Description : lingua_custodia_final full logo]
>>>>>
>>>>>  *The Translation Trustee*
>>>>>
>>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>>>
>>>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>>>> <%2B33%206%2084%2065%2068%2089>*
>>>>>
>>>>> *Email :*  *vito.mandor...@linguacustodia.com
>>>>> <massinissa.ah...@linguacustodia.com>*
>>>>>
>>>>> *Website :*
>>>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *M**. Vito MANDORINO -- Chief Scientist*
>>>
>>>
>>> [image: Description : Description : lingua_custodia_final full logo]
>>>
>>>  *The Translation Trustee*
>>>
>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>>>
>>> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
>>> <%2B33%206%2084%2065%2068%2089>*
>>>
>>> *Email :*  *vito.mandor...@linguacustodia.com
>>> <massinissa.ah...@linguacustodia.com>*
>>>
>>> *Website :*
>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>>>
>>
>>
>
>
> --
> *M**. Vito MANDORINO -- Chief Scientist*
>
>
> [image: Description : Description : lingua_custodia_final full logo]
>
>  *The Translation Trustee*
>
> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*
>
> *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
> <%2B33%206%2084%2065%2068%2089>*
>
> *Email :*  *vito.mandor...@linguacustodia.com
> <massinissa.ah...@linguacustodia.com>*
>
> *Website :*
> *www.linguacustodia.finance <http://www.linguacustodia.com/>*
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to