Hi Marcin

But the search time that Jesus quotes shouldn't include any translation 
option lookup, and therefore shouldn't benefit from phrase table 
caching,  should it?

cheers - Barry

On 12/03/15 11:00, Marcin Junczys-Dowmunt wrote:
>
> Hi Barry,
>
> I do have another cache used for decompression in my phrase table. 
> Maybe that's the reason? It's being shared between threads, so I guess 
> it gets filled up more quickly than the thread-specific caches. In 
> other words: I am cheating :)
>
> W dniu 2015-03-12 11:11, Barry Haddow napisał(a):
>
>> Hi Jesús
>>
>> As Marcin points out, when using the compact phrase table you need to allow 
>> Moses time to cache the translation options for the common phrase pairs. 
>> With the gzipped phrase table, it effectively caches the whole phrase table 
>> during loading, but you excluded this 1800+ seconds from your calculations.
>>
>> I'm curious why the search time is twice as long for gzipped as opposed to 
>> compact though (3.3s vs 1.6s). Once the translation options are loaded, they 
>> should be doing the same thing shouldn't they? Maybe the reduced process 
>> size with the compact phrase table gives the OS more space to cache LM 
>> pages? I'm not sure how accurate the timings given by Moses are.
>>
>> cheers - Barry
>>
>>
>> On 11/03/15 19:31, Jesús González Rubio wrote:
>>> 2015-03-11 19:21 GMT+00:00 Marcin Junczys-Dowmunt 
>>> <[email protected] <mailto:[email protected]> 
>>> <mailto:[email protected] <mailto:[email protected]>>>: Maybe 
>>> someone will correct me, but if I am not wrong, the gziped version 
>>> already calculates the future score while loading (i.e. the phrase 
>>> is being scored by the language model). The compact phrase table 
>>> cannot do this during loading and doing this on-line. This will be 
>>> the reason for the slow speed. I suppose your phrase table has not 
>>> been pruned? So, for instance function words like "the" can have 
>>> hundreds of thousands of counterparts that need to be scored by the 
>>> LM during collection. That makes sense. You can limit your phrase 
>>> table using Barry's prunePhraseTable tool. With this you can limit 
>>> it to, say, the 20 best phrases (corresponds to the ttable limit) 
>>> and only score this 20 phrases during collection. That should be 
>>> orders of magnitude faster. OK. Best, Marcin W dniu 11.03.2015 o 
>>> 20:12, Jesús González Rubio pisze:
>>>> Thanks for the quick response, I will try as you suggest. 
>>>> Nevertheless, my main concern is the time spent collecting options. 
>>>> Is it normal the difference observed respect to the gzip'ed tables? 
>>>> being the tables cached, shouldn't they be closer? 2015-03-11 18:52 
>>>> GMT+00:00 Marcin Junczys-Dowmunt <[email protected] 
>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>> <mailto:[email protected]>>>: Hi, Try measuring the differences 
>>>> again after a full system reboot (fresh reboot before each 
>>>> mesurement) or after purging OS read/write caches. Your phrase 
>>>> tables are most likely cached, which means they are in fact in 
>>>> memory. Best, Marcin W dniu 11.03.2015 o 19:31, Jesús González 
>>>> Rubio pisze:
>>>>> Hi, I'm obtaining some unintuitive timing results when using 
>>>>> compact phrase tables. The average translation time per sentence 
>>>>> is much higher for them in comparison to using gzip'ed phrase 
>>>>> tables. Particularly important is the difference in time required 
>>>>> to collect the options. This table summarizes the timings (in 
>>>>> seconds): Compact Gzip'ed on-disk in-memory Init: 5.9 6.3 1882.8 
>>>>> Per-sentence: - Collect: 5.9 5.8 0.2 - Search: 1.6 1.6 3.3 Results 
>>>>> in the table were computed using Moses v2.1 with one single thread 
>>>>> (-th 1) but I've seen similar results using the pre-compiled 
>>>>> binary for moses v3.0. The model comprises two phrase-tables (~2G 
>>>>> and ~3M), two lexicalized reordering tables (~700M and ~1M) and 
>>>>> two language models (~31G and ~38M). You can see the exact 
>>>>> configuration in the attached moses.ini file. Interestingly, there 
>>>>> is virtually no difference for the compact table between the the 
>>>>> on-disk and in-memory options. Additionally, timings were higher 
>>>>> for the initial sentences in both cases which I think should not 
>>>>> be the case for the in-memory option. May be the case that the 
>>>>> in-memory option of compact tables (-minpht-memory 
>>>>> -minlexr-memory) is not working properly? Cheers. -- Jesús 
>>>>> _______________________________________________ Moses-support 
>>>>> mailing list [email protected] <mailto:[email protected]> 
>>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> _______________________________________________ Moses-support 
>>>> mailing list [email protected] <mailto:[email protected]> 
>>>> <mailto:[email protected] <mailto:[email protected]>> 
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> -- Jesús _______________________________________________ 
>>> Moses-support mailing list [email protected] 
>>> <mailto:[email protected]> 
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to