So, shall I merge with master then?

W dniu 09.10.2015 o 18:14, Hieu Hoang pisze:
I appeared to have screwed up. THe unblockpt branch is large number of threads. It downside is that it appears to use more memory so multi_moses.py bring down the server @ high threadcount. Can't win them all


        1       5       10      15      20      25      30      35
49 real4m56.474s real1m17.770s real0m50.482s real0m49.970s real0m50.851s real0m52.411s real0m54.263s 0m55.137s Baseline (master) user4m39.099s user5m39.706s user6m32.275s user7m54.693s user8m7.420s user8m7.606s user8m26.099s 8m8.707s

sys0m17.379s sys0m35.081s sys0m55.350s sys1m13.207s sys1m21.048s sys1m25.325s sys1m26.464s 1m28.651s

        
        
        
        
        
        
        
        
50 real4m52.220s real1m16.839s real0m45.847s real0m38.332s real0m36.764s real0m36.254s real0m36.254s 0m36.833s (49) + unblockpt user4m34.703s user5m38.984s user6m14.616s user7m14.220s user8m45.198s user9m49.285s user9m49.285s 11m51.531s

sys0m17.484s sys0m34.341s sys0m57.122s sys1m34.292s sys2m19.347s sys3m34.444s sys3m34.444s 4m55.236s

        
        
        
        
        
        
        
        
51      
real1m16.387s real0m41.680s real0m38.793s real0m31.237s Crashed Crashed Crashed
(50) + multi_moses      
        user5m6.564s    user5m21.844s   user5m44.855s   user6m21.015s   
        
        

        
        sys0m40.458s    sys0m57.749s    sys1m16.392s    sys1m44.173s    
        
        

        
        
        
        
        
        
        
        
52      
real1m32.930s real0m49.833s real0m49.833s real0m28.860s real0m30.364s
        
(49) + multi_moses      
user5m2.480s user5m14.156s user5m14.156s user6m22.374s user6m40.412s
        

        
        sys0m35.557s    sys0m53.235s    sys0m53.235s    sys1m41.948s    
sys2m14.619s    
        

        
        
        
        
        
        
        
        

        
        
        
        
        
        
        
        
53 real4m36.515s real1m13.842s real0m44.441s real0m36.498s real0m34.639s real0m33.218s real0m33.003s 0m33.482s (50) + probing user4m20.862s user5m20.037s user6m0.768s user6m56.545s user8m21.316s user9m20.490s user10m22.638s 10m50.360s

sys0m15.712s sys0m35.746s sys0m53.254s sys1m19.331s sys1m54.006s sys2m40.239s sys3m43.040s 3m59.816s



On 08/10/2015 21:00, Marcin Junczys-Dowmunt wrote:
I have a branch, "unblockpt", those locks are gone and caches are
thread-local. Hieu claims there is still not speed up.

W dniu 08.10.2015 o 21:56, Kenneth Heafield pisze:
Good point.  I now blame this code from
moses/TranslationModel/CompactPT/TargetPhraseCollectionCache.h

Looks like a case for a concurrent fixed-size hash table.  Failing that,
banded locks instead of a single lock?  Namely an array of hash tables,
each of which is independently locked.

    /** retrieve translations for source phrase from persistent cache **/
    void Cache(const Phrase &sourcePhrase, TargetPhraseVectorPtr tpv,
               size_t bitsLeft = 0, size_t maxRank = 0) {
#ifdef WITH_THREADS
      boost::mutex::scoped_lock lock(m_mutex);
#endif

      // check if source phrase is already in cache
      iterator it = m_phraseCache.find(sourcePhrase);
      if(it != m_phraseCache.end())
        // if found, just update clock
        it->second.m_clock = clock();
      else {
        // else, add to cache
        if(maxRank && tpv->size() > maxRank) {
          TargetPhraseVectorPtr tpv_temp(new TargetPhraseVector());
          tpv_temp->resize(maxRank);
          std::copy(tpv->begin(), tpv->begin() + maxRank, tpv_temp->begin());
          m_phraseCache[sourcePhrase] = LastUsed(clock(), tpv_temp, bitsLeft);
        } else
          m_phraseCache[sourcePhrase] = LastUsed(clock(), tpv, bitsLeft);
      }
    }

    std::pair<TargetPhraseVectorPtr, size_t> Retrieve(const Phrase
&sourcePhrase) {
#ifdef WITH_THREADS
      boost::mutex::scoped_lock lock(m_mutex);
#endif

      iterator it = m_phraseCache.find(sourcePhrase);
      if(it != m_phraseCache.end()) {
        LastUsed &lu = it->second;
        lu.m_clock = clock();
        return std::make_pair(lu.m_tpv, lu.m_bitsLeft);
      } else
        return std::make_pair(TargetPhraseVectorPtr(), 0);
    }



On 10/08/2015 08:39 PM, Marcin Junczys-Dowmunt wrote:
How is probing-pt avoiding the same problem then?

W dniu 08.10.2015 o 21:36, Kenneth Heafield pisze:
There's a ton of object/malloc churn in creating Moses::TargetPhrase
objects, most of which are thrown away.  If PhraseDictionaryMemory
(which creates and keeps the objects) scales better than CompactPT,
that's the first thing I'd optimize.

On 10/08/2015 08:30 PM, Marcin Junczys-Dowmunt wrote:
We did quite a bit of experimenting with that, usually there is hardly
any measureable quality loss until you get below 1000. Good enough for
deployment systems. It seems however you can get up 0.4 BLEU increase
when going really high (about 5000 and beyond) with larger distortion
limits. But that's rather uninteresting for commercial applications.

W dniu 08.10.2015 o 21:24, Michael Denkowski pisze:
Hi Vincent,

That definitely helps.  I reran everything comparing the original
2000/2000 to your suggestion of 400/400.  There isn't much difference
for a single multi-threaded instance, but there's about a 30% speedup
when using all single-threaded instances:

                pop limit & stack
procs/threads    2000      400
1x16             5.46     5.68
2x8              7.58     8.70
4x4              9.71    11.24
8x2             12.50    15.87
16x1            14.08    18.52

There wasn't any degradation to BLEU/TER/Meteor but this is just one
data point and a fairly simple system.  I would be curious to see how
things work out in other users' systems.

Best,
Michael

On Thu, Oct 8, 2015 at 2:34 PM, Vincent Nguyen <vngu...@neuf.fr
<mailto:vngu...@neuf.fr>> wrote:

       out of curiosity, what gain do you get with 400 for both stack and
       cube pruning ?


       Le 08/10/2015 20:26, Michael Denkowski a écrit :

           Hi Vincent,

           I'm using cube pruning with the following options for all data
           points:

           [search-algorithm]
           1

           [cube-pruning-deterministic-search]
           true

           [cube-pruning-pop-limit]
           2000

           [stack]
           2000

           Best,
           Michael




_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
http://www.hoang.co.uk/hieu

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to