Hi Marcin But the search time that Jesus quotes shouldn't include any translation option lookup, and therefore shouldn't benefit from phrase table caching, should it?
cheers - Barry On 12/03/15 11:00, Marcin Junczys-Dowmunt wrote: > > Hi Barry, > > I do have another cache used for decompression in my phrase table. > Maybe that's the reason? It's being shared between threads, so I guess > it gets filled up more quickly than the thread-specific caches. In > other words: I am cheating :) > > W dniu 2015-03-12 11:11, Barry Haddow napisał(a): > >> Hi Jesús >> >> As Marcin points out, when using the compact phrase table you need to allow >> Moses time to cache the translation options for the common phrase pairs. >> With the gzipped phrase table, it effectively caches the whole phrase table >> during loading, but you excluded this 1800+ seconds from your calculations. >> >> I'm curious why the search time is twice as long for gzipped as opposed to >> compact though (3.3s vs 1.6s). Once the translation options are loaded, they >> should be doing the same thing shouldn't they? Maybe the reduced process >> size with the compact phrase table gives the OS more space to cache LM >> pages? I'm not sure how accurate the timings given by Moses are. >> >> cheers - Barry >> >> >> On 11/03/15 19:31, Jesús González Rubio wrote: >>> 2015-03-11 19:21 GMT+00:00 Marcin Junczys-Dowmunt >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>>: Maybe >>> someone will correct me, but if I am not wrong, the gziped version >>> already calculates the future score while loading (i.e. the phrase >>> is being scored by the language model). The compact phrase table >>> cannot do this during loading and doing this on-line. This will be >>> the reason for the slow speed. I suppose your phrase table has not >>> been pruned? So, for instance function words like "the" can have >>> hundreds of thousands of counterparts that need to be scored by the >>> LM during collection. That makes sense. You can limit your phrase >>> table using Barry's prunePhraseTable tool. With this you can limit >>> it to, say, the 20 best phrases (corresponds to the ttable limit) >>> and only score this 20 phrases during collection. That should be >>> orders of magnitude faster. OK. Best, Marcin W dniu 11.03.2015 o >>> 20:12, Jesús González Rubio pisze: >>>> Thanks for the quick response, I will try as you suggest. >>>> Nevertheless, my main concern is the time spent collecting options. >>>> Is it normal the difference observed respect to the gzip'ed tables? >>>> being the tables cached, shouldn't they be closer? 2015-03-11 18:52 >>>> GMT+00:00 Marcin Junczys-Dowmunt <[email protected] >>>> <mailto:[email protected]> <mailto:[email protected] >>>> <mailto:[email protected]>>>: Hi, Try measuring the differences >>>> again after a full system reboot (fresh reboot before each >>>> mesurement) or after purging OS read/write caches. Your phrase >>>> tables are most likely cached, which means they are in fact in >>>> memory. Best, Marcin W dniu 11.03.2015 o 19:31, Jesús González >>>> Rubio pisze: >>>>> Hi, I'm obtaining some unintuitive timing results when using >>>>> compact phrase tables. The average translation time per sentence >>>>> is much higher for them in comparison to using gzip'ed phrase >>>>> tables. Particularly important is the difference in time required >>>>> to collect the options. This table summarizes the timings (in >>>>> seconds): Compact Gzip'ed on-disk in-memory Init: 5.9 6.3 1882.8 >>>>> Per-sentence: - Collect: 5.9 5.8 0.2 - Search: 1.6 1.6 3.3 Results >>>>> in the table were computed using Moses v2.1 with one single thread >>>>> (-th 1) but I've seen similar results using the pre-compiled >>>>> binary for moses v3.0. The model comprises two phrase-tables (~2G >>>>> and ~3M), two lexicalized reordering tables (~700M and ~1M) and >>>>> two language models (~31G and ~38M). You can see the exact >>>>> configuration in the attached moses.ini file. Interestingly, there >>>>> is virtually no difference for the compact table between the the >>>>> on-disk and in-memory options. Additionally, timings were higher >>>>> for the initial sentences in both cases which I think should not >>>>> be the case for the in-memory option. May be the case that the >>>>> in-memory option of compact tables (-minpht-memory >>>>> -minlexr-memory) is not working properly? Cheers. -- Jesús >>>>> _______________________________________________ Moses-support >>>>> mailing list [email protected] <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> _______________________________________________ Moses-support >>>> mailing list [email protected] <mailto:[email protected]> >>>> <mailto:[email protected] <mailto:[email protected]>> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> -- Jesús _______________________________________________ >>> Moses-support mailing list [email protected] >>> <mailto:[email protected]> >>> http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
