Hi Jesús As Marcin points out, when using the compact phrase table you need to allow Moses time to cache the translation options for the common phrase pairs. With the gzipped phrase table, it effectively caches the whole phrase table during loading, but you excluded this 1800+ seconds from your calculations.
I'm curious why the search time is twice as long for gzipped as opposed to compact though (3.3s vs 1.6s). Once the translation options are loaded, they should be doing the same thing shouldn't they? Maybe the reduced process size with the compact phrase table gives the OS more space to cache LM pages? I'm not sure how accurate the timings given by Moses are. cheers - Barry On 11/03/15 19:31, Jesús González Rubio wrote: > > 2015-03-11 19:21 GMT+00:00 Marcin Junczys-Dowmunt <[email protected] > <mailto:[email protected]>>: > > Maybe someone will correct me, but if I am not wrong, the gziped > version already calculates the future score while loading (i.e. > the phrase is being scored by the language model). The compact > phrase table cannot do this during loading and doing this on-line. > This will be the reason for the slow speed. I suppose your phrase > table has not been pruned? So, for instance function words like > "the" can have hundreds of thousands of counterparts that need to > be scored by the LM during collection. > > That makes sense. > > You can limit your phrase table using Barry's prunePhraseTable > tool. With this you can limit it to, say, the 20 best phrases > (corresponds to the ttable limit) and only score this 20 phrases > during collection. That should be orders of magnitude faster. > > OK. > > Best, > Marcin > > W dniu 11.03.2015 o 20:12, Jesús González Rubio pisze: >> Thanks for the quick response, I will try as you suggest. >> >> Nevertheless, my main concern is the time spent collecting >> options. Is it normal the difference observed respect to the >> gzip'ed tables? being the tables cached, shouldn't they be closer? >> >> 2015-03-11 18:52 GMT+00:00 Marcin Junczys-Dowmunt >> <[email protected] <mailto:[email protected]>>: >> >> Hi, >> Try measuring the differences again after a full system >> reboot (fresh reboot before each mesurement) or after purging >> OS read/write caches. Your phrase tables are most likely >> cached, which means they are in fact in memory. >> Best, >> Marcin >> >> W dniu 11.03.2015 o 19:31, Jesús González Rubio pisze: >>> Hi, >>> >>> I'm obtaining some unintuitive timing results when using >>> compact phrase tables. The average translation time per >>> sentence is much higher for them in comparison to using >>> gzip'ed phrase tables. Particularly important is the >>> difference in time required to collect the options. This >>> table summarizes the timings (in seconds): >>> >>> Compact Gzip'ed >>> on-disk in-memory >>> Init: 5.9 6.3 1882.8 >>> Per-sentence: >>> - Collect: 5.9 5.8 0.2 >>> - Search: 1.6 1.6 3.3 >>> >>> Results in the table were computed using Moses v2.1 with one >>> single thread (-th 1) but I've seen similar results using >>> the pre-compiled binary for moses v3.0. The model comprises >>> two phrase-tables (~2G and ~3M), two lexicalized reordering >>> tables (~700M and ~1M) and two language models (~31G and >>> ~38M). You can see the exact configuration in the attached >>> moses.ini file. >>> >>> Interestingly, there is virtually no difference for the >>> compact table between the the on-disk and in-memory options. >>> Additionally, timings were higher for the initial sentences >>> in both cases which I think should not be the case for the >>> in-memory option. >>> >>> May be the case that the in-memory option of compact tables >>> (-minpht-memory -minlexr-memory) is not working properly? >>> >>> Cheers. >>> -- >>> Jesús >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] <mailto:[email protected]> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] <mailto:[email protected]> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > > > -- > Jesús > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
