Hi Kenneth,
Great, looks like that did the job. Thank you all for the quick reaction.
Cheers,
Marcin

20/9/2011, "Kenneth Heafield" <[email protected]> napisał/a:

>I took at look at the existing FactorCollection code and it made me cry,
>so I rewrote it for revision 4242 including a better locking strategy. 
>
>On 09/20/11 12:10, Marcin Junczys-Dowmunt wrote:
>> Hi Barry,
>> very high lock contention. Deadlock is the wrong word. With 48 threads
>> 'top' shows me roughly 120% of processor load instead of 4800%. Actual
>> translation speed however is far below single thread.
>>
>> Yes, we are running an online system, filtering is not an option.
>> Bye,
>> Marcin
>>
>> 20/9/2011, "Barry Haddow" <[email protected]> napisał/a:
>>
>>> Hi Marcin
>>>
>>> That makes sense. I looked at the locking in FactorCollection recently and 
>>> realised that it wasn't implemented correctly, although I didn't know that 
>>> it 
>>> had the potential for deadlock.
>>>
>>> Do you know if it's an actual deadlock that you're observing, or very high 
>>> lock contention?
>>>
>>> btw - why aren't you filtering the phrase table? Are you running an online 
>>> system where the source sentences are not given in advance?
>>>
>>> cheers - Barry
>>>
>>> On Tuesday 20 September 2011 11:22:49 Marcin Junczys-Dowmunt wrote:
>>>> Hall all,
>>>> by the way, I have found the place, where the heavy locking is occurring.
>>>> It's the lock in
>>>>
>>>> FactorCollection::AddFactor
>>>>
>>>> When I simply and naively remove that one, everything works on full
>>>> throttle with 48 threads and nothing bad seems to be happening. With
>>>> this locks in place the deadlock occurs starting with around 20 threads
>>>> regardless whether the binary phrase table is used or the in-memory
>>>> version.
>>>>
>>>> The size of the phrase table is also a factor. With a small phrase table
>>>> filtered according to given test set there are no deadlocks. Does that
>>>> make any sense?
>>>>
>>>> Bye,
>>>> Marcin
>>>>
>>>> 19/9/2011, "Barry Haddow" <[email protected]> napisaĹ&#65533;/a:
>>>>> Hi Marcin
>>>>>
>>>>> On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
>>>>>> The binary implementation seems to become unusable with more than 10-12
>>>>>> threads. Speed drops as more threads are used until it nearly deadlocks
>>>>>> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
>>>>>> copying the binary phrase tables to a ramdisk does not solve the
>>>>>> problem. The behavior stays the same. The in-memory version works fine
>>>>>> with 48 threads, but uses nearly all our ram.
>>>>> There's a shared cache for the on-disk phrase table, which is probably
>>>>> where the contention is coming from. I don't think disabling the cache
>>>>> would help as in a large phrase table you'll have 10s of 1000s of
>>>>> translations of common words and punctuation, which you don't want to
>>>>> reload for every sentence. A per-thread cache may improve things.
>>>>>
>>>>>> Pruning is also not enough, our filtered phrase table still takes around
>>>>>> 300 GB when loaded into memory, I did not even dare to try and load the
>>>>>> unfiltered phrase-table into memory :). But I will take a look at the
>>>>>> implementation from the marathon, thanks.
>>>>> I think Hieu was referring to this
>>>>> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
>>>>> rather than filtering, which may be of some use. It's hard to imagine that
>>>>> a 500G phrase table doesn't contain a lot of noise. I'm surprised that
>>>>> filtering doesn't remove more though - are you decoding large batches of
>>>>> sentences?
>>>>>
>>>>>> At the moment I am thinking about using a perfect hash function as an
>>>>>> index and keeping target phrases as packed strings in memory. That
>>>>>> should use about as much memory as a gzipped phrase table on disk, it
>>>>>> will be slower though, but probably still faster than the binary
>>>>>> version.
>>>>> Will look forward to seeing how you get on,
>>>>>
>>>>> cheers - Barry
>>>>>
>>>>> --
>>>>> The University of Edinburgh is a charitable body, registered in
>>>>> Scotland, with registration number SC005336.
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>> --
>>> Barry Haddow
>>> University of Edinburgh
>>> +44 (0) 131 651 3173
>>>
>>> -- 
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>_______________________________________________
>Moses-support mailing list
>[email protected]
>http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to