hiya Hieu Hoang http://www.hoang.co.uk/hieu
On 13 October 2016 at 15:08, Vito Mandorino < [email protected]> wrote: > We haven't checked the probingpt + minlexr speedup yet, however we have > found some further differences in the output with respect to the standard > Moses decoder. > > It happens sometimes that the order of replacement of placeholders with > actual numbers is not the good one. For instance : > > moses2 output: as of december 2012 , 31 > moses output: as of december 31 , 2012 > > moses2 output: à jour au 2013 février 15 > moses output: à jour au 15 février 2013 > > Is this the expected behavior? > no, they should work the same way. Model files and example input would be good so I can replicate it > > Another minor difference is the handling of the carriage return character > ("\r") . It seems to be deleted by standard Moses and converted into > newline by Moses2. > there's no explicit handling of this in either moses or moses2. Whatever happens is not guaranteed to happen. You're better off preprocessing to remove \r and other non-printing characters > > Best, > Vito > > 2016-10-07 17:24 GMT+02:00 Hieu Hoang <[email protected]>: > >> yep, it should give you a big speedup compared to probingpt + minlexr >> model >> >> Hieu Hoang >> http://www.hoang.co.uk/hieu >> >> On 7 October 2016 at 16:21, Vito Mandorino <vito.mandorino@linguacustodia >> .com> wrote: >> >>> Yes I modified the line in the moses.ini . My comparison was with >>> respect to probingPT + minlexr reordering model (rather than .gz reordering >>> model) >>> >>> 2016-10-07 16:25 GMT+02:00 Hieu Hoang <[email protected]>: >>> >>>> weird. it should be a massive speedup (~500%). You have to change the >>>> moses.ini file slightly >>>> >>>> [feature] >>>> LexicalReordering … path=reordering-table.msd-bidi >>>> rectional-fe.0.5.0-0.gz >>>> to >>>> [feature] >>>> LexicalReordering … property-index=0 >>>> >>>> >>>> Hieu Hoang >>>> http://www.hoang.co.uk/hieu >>>> >>>> On 7 October 2016 at 15:02, Vito Mandorino < >>>> [email protected]> wrote: >>>> >>>>> Yes, that worked for me as well, thank you. There is a little >>>>> improvement in speed but not that much actually (about 5% faster using 30 >>>>> threads). >>>>> >>>>> 2016-10-04 11:44 GMT+02:00 Hieu Hoang <[email protected]>: >>>>> >>>>>> yes - the script expects the files to be gzipped. >>>>>> It runs ok for me. I executed this: >>>>>> >>>>>> MOSES_DIR=~/workspace/github/mosesdecoder.perf >>>>>> >>>>>> $MOSES_DIR/scripts/generic/binarize4moses2.perl >>>>>> --phrase-table=phrase-table.gz >>>>>> --lex-ro=reordering-table.wbe-msd-bidirectional-fe.gz >>>>>> --output-dir=integrated_phrase-reordering/ --num-lex-scores=6 >>>>>> >>>>>> Got this: >>>>>> >>>>>> Executing: gzip -dc phrase-table.gz | >>>>>> /home/hieu/workspace/github/mosesdecoder.perf/scripts/generi >>>>>> c/../../contrib/sigtest-filter/filter-pt -n 0 | gzip -c > >>>>>> ./tmp.14373/pt.gz >>>>>> ... >>>>>> Reading phrase table finished, writing remaining files to disk. >>>>>> >>>>>> $ ll integrated_phrase-reordering/ >>>>>> total 24688 >>>>>> drwxrwxr-x 2 hieu hieu 4096 Oct 4 10:38 ./ >>>>>> drwxrwxr-x 5 hieu hieu 4096 Oct 4 10:42 ../ >>>>>> -rw-rw-r-- 1 hieu hieu 917861 Oct 4 10:42 Alignments.dat >>>>>> -rw-rw-r-- 1 hieu hieu 2267885 Oct 4 10:42 cache >>>>>> -rw-rw-r-- 1 hieu hieu 76 Oct 4 10:42 config >>>>>> -rw-rw-r-- 1 hieu hieu 3146720 Oct 4 10:42 probing_hash.dat >>>>>> -rw-rw-r-- 1 hieu hieu 333856 Oct 4 10:42 source_vocabids >>>>>> -rw-rw-r-- 1 hieu hieu 18429920 Oct 4 10:42 TargetColl.dat >>>>>> -rw-rw-r-- 1 hieu hieu 121401 Oct 4 10:42 TargetVocab.dat >>>>>> >>>>>> >>>>>> On 04/10/2016 09:06, Vito Mandorino wrote: >>>>>> >>>>>> The command was >>>>>> >>>>>> perl /home/Moses/mosesdecoder/scripts/generic/binarize4moses2.perl >>>>>> --phrase-table=/home/vito/phrase-table.sorted >>>>>> --lex-ro=/home/vito/reordering-table.sorted >>>>>> --output-dir=/home/vito/integrated_phrase-reordering/ >>>>>> --num-lex-scores=6 >>>>>> >>>>>> The tables in the command are sorted with LC_ALL . I attach them in >>>>>> .gz format. Should one use the .gz format also in the command above? >>>>>> >>>>>> Vito >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> *M**. Vito MANDORINO -- Chief Scientist* >>>>> >>>>> >>>>> [image: Description : Description : lingua_custodia_final full logo] >>>>> >>>>> *The Translation Trustee* >>>>> >>>>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >>>>> >>>>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >>>>> <%2B33%206%2084%2065%2068%2089>* >>>>> >>>>> *Email :* *[email protected] >>>>> <[email protected]>* >>>>> >>>>> *Website :* >>>>> *www.linguacustodia.finance <http://www.linguacustodia.com/>* >>>>> >>>> >>>> >>> >>> >>> -- >>> *M**. Vito MANDORINO -- Chief Scientist* >>> >>> >>> [image: Description : Description : lingua_custodia_final full logo] >>> >>> *The Translation Trustee* >>> >>> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >>> >>> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >>> <%2B33%206%2084%2065%2068%2089>* >>> >>> *Email :* *[email protected] >>> <[email protected]>* >>> >>> *Website :* >>> *www.linguacustodia.finance <http://www.linguacustodia.com/>* >>> >> >> > > > -- > *M**. Vito MANDORINO -- Chief Scientist* > > > [image: Description : Description : lingua_custodia_final full logo] > > *The Translation Trustee* > > *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* > > *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 > <%2B33%206%2084%2065%2068%2089>* > > *Email :* *[email protected] > <[email protected]>* > > *Website :* > *www.linguacustodia.finance <http://www.linguacustodia.com/>* >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
