Hi Kenneth, Great, looks like that did the job. Thank you all for the quick reaction. Cheers, Marcin
20/9/2011, "Kenneth Heafield" <[email protected]> napisał/a: >I took at look at the existing FactorCollection code and it made me cry, >so I rewrote it for revision 4242 including a better locking strategy. > >On 09/20/11 12:10, Marcin Junczys-Dowmunt wrote: >> Hi Barry, >> very high lock contention. Deadlock is the wrong word. With 48 threads >> 'top' shows me roughly 120% of processor load instead of 4800%. Actual >> translation speed however is far below single thread. >> >> Yes, we are running an online system, filtering is not an option. >> Bye, >> Marcin >> >> 20/9/2011, "Barry Haddow" <[email protected]> napisał/a: >> >>> Hi Marcin >>> >>> That makes sense. I looked at the locking in FactorCollection recently and >>> realised that it wasn't implemented correctly, although I didn't know that >>> it >>> had the potential for deadlock. >>> >>> Do you know if it's an actual deadlock that you're observing, or very high >>> lock contention? >>> >>> btw - why aren't you filtering the phrase table? Are you running an online >>> system where the source sentences are not given in advance? >>> >>> cheers - Barry >>> >>> On Tuesday 20 September 2011 11:22:49 Marcin Junczys-Dowmunt wrote: >>>> Hall all, >>>> by the way, I have found the place, where the heavy locking is occurring. >>>> It's the lock in >>>> >>>> FactorCollection::AddFactor >>>> >>>> When I simply and naively remove that one, everything works on full >>>> throttle with 48 threads and nothing bad seems to be happening. With >>>> this locks in place the deadlock occurs starting with around 20 threads >>>> regardless whether the binary phrase table is used or the in-memory >>>> version. >>>> >>>> The size of the phrase table is also a factor. With a small phrase table >>>> filtered according to given test set there are no deadlocks. Does that >>>> make any sense? >>>> >>>> Bye, >>>> Marcin >>>> >>>> 19/9/2011, "Barry Haddow" <[email protected]> napisaĹ�/a: >>>>> Hi Marcin >>>>> >>>>> On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote: >>>>>> The binary implementation seems to become unusable with more than 10-12 >>>>>> threads. Speed drops as more threads are used until it nearly deadlocks >>>>>> at around 30 threads. I am using a 48-core server with 512 GB ram. Even >>>>>> copying the binary phrase tables to a ramdisk does not solve the >>>>>> problem. The behavior stays the same. The in-memory version works fine >>>>>> with 48 threads, but uses nearly all our ram. >>>>> There's a shared cache for the on-disk phrase table, which is probably >>>>> where the contention is coming from. I don't think disabling the cache >>>>> would help as in a large phrase table you'll have 10s of 1000s of >>>>> translations of common words and punctuation, which you don't want to >>>>> reload for every sentence. A per-thread cache may improve things. >>>>> >>>>>> Pruning is also not enough, our filtered phrase table still takes around >>>>>> 300 GB when loaded into memory, I did not even dare to try and load the >>>>>> unfiltered phrase-table into memory :). But I will take a look at the >>>>>> implementation from the marathon, thanks. >>>>> I think Hieu was referring to this >>>>> http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16 >>>>> rather than filtering, which may be of some use. It's hard to imagine that >>>>> a 500G phrase table doesn't contain a lot of noise. I'm surprised that >>>>> filtering doesn't remove more though - are you decoding large batches of >>>>> sentences? >>>>> >>>>>> At the moment I am thinking about using a perfect hash function as an >>>>>> index and keeping target phrases as packed strings in memory. That >>>>>> should use about as much memory as a gzipped phrase table on disk, it >>>>>> will be slower though, but probably still faster than the binary >>>>>> version. >>>>> Will look forward to seeing how you get on, >>>>> >>>>> cheers - Barry >>>>> >>>>> -- >>>>> The University of Edinburgh is a charitable body, registered in >>>>> Scotland, with registration number SC005336. >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>> -- >>> Barry Haddow >>> University of Edinburgh >>> +44 (0) 131 651 3173 >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > >_______________________________________________ >Moses-support mailing list >[email protected] >http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
