Re: [Moses-support] Faster decoding with multiple moses instances

Marcin Junczys-Dowmunt Thu, 08 Oct 2015 14:03:44 -0700

I have a branch, "unblockpt", those locks are gone and caches are 
thread-local. Hieu claims there is still not speed up.


W dniu 08.10.2015 o 21:56, Kenneth Heafield pisze:
> Good point.  I now blame this code from
> moses/TranslationModel/CompactPT/TargetPhraseCollectionCache.h
>
> Looks like a case for a concurrent fixed-size hash table.  Failing that,
> banded locks instead of a single lock?  Namely an array of hash tables,
> each of which is independently locked.
>
>    /** retrieve translations for source phrase from persistent cache **/
>    void Cache(const Phrase &sourcePhrase, TargetPhraseVectorPtr tpv,
>               size_t bitsLeft = 0, size_t maxRank = 0) {
> #ifdef WITH_THREADS
>      boost::mutex::scoped_lock lock(m_mutex);
> #endif
>
>      // check if source phrase is already in cache
>      iterator it = m_phraseCache.find(sourcePhrase);
>      if(it != m_phraseCache.end())
>        // if found, just update clock
>        it->second.m_clock = clock();
>      else {
>        // else, add to cache
>        if(maxRank && tpv->size() > maxRank) {
>          TargetPhraseVectorPtr tpv_temp(new TargetPhraseVector());
>          tpv_temp->resize(maxRank);
>          std::copy(tpv->begin(), tpv->begin() + maxRank, tpv_temp->begin());
>          m_phraseCache[sourcePhrase] = LastUsed(clock(), tpv_temp, bitsLeft);
>        } else
>          m_phraseCache[sourcePhrase] = LastUsed(clock(), tpv, bitsLeft);
>      }
>    }
>
>    std::pair<TargetPhraseVectorPtr, size_t> Retrieve(const Phrase
> &sourcePhrase) {
> #ifdef WITH_THREADS
>      boost::mutex::scoped_lock lock(m_mutex);
> #endif
>
>      iterator it = m_phraseCache.find(sourcePhrase);
>      if(it != m_phraseCache.end()) {
>        LastUsed &lu = it->second;
>        lu.m_clock = clock();
>        return std::make_pair(lu.m_tpv, lu.m_bitsLeft);
>      } else
>        return std::make_pair(TargetPhraseVectorPtr(), 0);
>    }
>
>
>
> On 10/08/2015 08:39 PM, Marcin Junczys-Dowmunt wrote:
>> How is probing-pt avoiding the same problem then?
>>
>> W dniu 08.10.2015 o 21:36, Kenneth Heafield pisze:
>>> There's a ton of object/malloc churn in creating Moses::TargetPhrase
>>> objects, most of which are thrown away.  If PhraseDictionaryMemory
>>> (which creates and keeps the objects) scales better than CompactPT,
>>> that's the first thing I'd optimize.
>>>
>>> On 10/08/2015 08:30 PM, Marcin Junczys-Dowmunt wrote:
>>>> We did quite a bit of experimenting with that, usually there is hardly
>>>> any measureable quality loss until you get below 1000. Good enough for
>>>> deployment systems. It seems however you can get up 0.4 BLEU increase
>>>> when going really high (about 5000 and beyond) with larger distortion
>>>> limits. But that's rather uninteresting for commercial applications.
>>>>
>>>> W dniu 08.10.2015 o 21:24, Michael Denkowski pisze:
>>>>> Hi Vincent,
>>>>>
>>>>> That definitely helps.  I reran everything comparing the original
>>>>> 2000/2000 to your suggestion of 400/400.  There isn't much difference
>>>>> for a single multi-threaded instance, but there's about a 30% speedup
>>>>> when using all single-threaded instances:
>>>>>
>>>>>                pop limit & stack
>>>>> procs/threads    2000      400
>>>>> 1x16             5.46     5.68
>>>>> 2x8              7.58     8.70
>>>>> 4x4              9.71    11.24
>>>>> 8x2             12.50    15.87
>>>>> 16x1            14.08    18.52
>>>>>
>>>>> There wasn't any degradation to BLEU/TER/Meteor but this is just one
>>>>> data point and a fairly simple system.  I would be curious to see how
>>>>> things work out in other users' systems.
>>>>>
>>>>> Best,
>>>>> Michael
>>>>>
>>>>> On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>>       out of curiosity, what gain do you get with 400 for both stack and
>>>>>       cube pruning ?
>>>>>
>>>>>
>>>>>       Le 08/10/2015 20:26, Michael Denkowski a écrit :
>>>>>
>>>>>           Hi Vincent,
>>>>>
>>>>>           I'm using cube pruning with the following options for all data
>>>>>           points:
>>>>>
>>>>>           [search-algorithm]
>>>>>           1
>>>>>
>>>>>           [cube-pruning-deterministic-search]
>>>>>           true
>>>>>
>>>>>           [cube-pruning-pop-limit]
>>>>>           2000
>>>>>
>>>>>           [stack]
>>>>>           2000
>>>>>
>>>>>           Best,
>>>>>           Michael
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Faster decoding with multiple moses instances

Reply via email to