Hall all,
by the way, I have found the place, where the heavy locking is occurring.
It's the lock in

FactorCollection::AddFactor

When I simply and naively remove that one, everything works on full
throttle with 48 threads and nothing bad seems to be happening. With
this locks in place the deadlock occurs starting with around 20 threads
regardless whether the binary phrase table is used or the in-memory
version.

The size of the phrase table is also a factor. With a small phrase table
filtered according to given test set there are no deadlocks. Does that
make any sense?

Bye,
Marcin

19/9/2011, "Barry Haddow" <[email protected]> napisaƂ/a:

>Hi Marcin
>
>On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
>> The binary implementation seems to become unusable with more than 10-12
>> threads. Speed drops as more threads are used until it nearly deadlocks
>> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
>> copying the binary phrase tables to a ramdisk does not solve the
>> problem. The behavior stays the same. The in-memory version works fine
>> with 48 threads, but uses nearly all our ram.
>
>There's a shared cache for the on-disk phrase table, which is probably where
>the contention is coming from. I don't think disabling the cache would help as
>in a large phrase table you'll have 10s of 1000s of translations of common
>words and punctuation, which you don't want to reload for every sentence. A
>per-thread cache may improve things.
>
>>
>> Pruning is also not enough, our filtered phrase table still takes around
>> 300 GB when loaded into memory, I did not even dare to try and load the
>> unfiltered phrase-table into memory :). But I will take a look at the
>> implementation from the marathon, thanks.
>
>I think Hieu was referring to this
>http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
>rather than filtering, which may be of some use. It's hard to imagine that a
>500G phrase table doesn't contain a lot of noise. I'm surprised that filtering
>doesn't remove more though - are you decoding large batches of sentences?
>
>>
>> At the moment I am thinking about using a perfect hash function as an
>> index and keeping target phrases as packed strings in memory. That
>> should use about as much memory as a gzipped phrase table on disk, it
>> will be slower though, but probably still faster than the binary version.
>>
>
>Will look forward to seeing how you get on,
>
>cheers - Barry
>
>--
>The University of Edinburgh is a charitable body, registered in
>Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to