Hi Marcin

On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
> The binary implementation seems to become unusable with more than 10-12
> threads. Speed drops as more threads are used until it nearly deadlocks
> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
> copying the binary phrase tables to a ramdisk does not solve the
> problem. The behavior stays the same. The in-memory version works fine
> with 48 threads, but uses nearly all our ram.

There's a shared cache for the on-disk phrase table, which is probably where 
the contention is coming from. I don't think disabling the cache would help as 
in a large phrase table you'll have 10s of 1000s of translations of common 
words and punctuation, which you don't want to reload for every sentence. A 
per-thread cache may improve things.

> 
> Pruning is also not enough, our filtered phrase table still takes around
> 300 GB when loaded into memory, I did not even dare to try and load the
> unfiltered phrase-table into memory :). But I will take a look at the
> implementation from the marathon, thanks.

I think Hieu was referring to this
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
rather than filtering, which may be of some use. It's hard to imagine that a 
500G phrase table doesn't contain a lot of noise. I'm surprised that filtering 
doesn't remove more though - are you decoding large batches of sentences?

> 
> At the moment I am thinking about using a perfect hash function as an
> index and keeping target phrases as packed strings in memory. That
> should use about as much memory as a gzipped phrase table on disk, it
> will be slower though, but probably still faster than the binary version.
> 

Will look forward to seeing how you get on,

cheers - Barry

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to