On Wednesday, February 27, 2019 at 1:35:04 AM UTC-8, Dmitry Vyukov wrote:
>
> Hi, 
>
> TensorFlow CPU task scheduler I wrote some time ago: 
>
>
> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h?at=default&fileviewer=file-view-default
>  
>
>
> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/RunQueue.h?at=default&fileviewer=file-view-default
>  
>
>
> https://bitbucket.org/eigen/eigen/src/9fdbc9e653c1adc64db1078fd2ec899c1e33ba0c/unsupported/Eigen/CXX11/src/ThreadPool/EventCount.h?at=default&fileviewer=file-view-default
>  
>
> This and some other fixes improved model training time on CPU up to 3x. 
> There is some interesting lock-free stuff in there. 
>
> Main design goal is to tailor pieces for specific problem at hand & be 
> practical Eg lock-free where matters mutex-based otherwise. Sane level 
> of complexity(need to get it right). Not trading practical properties 
> that matter (array-based queue) for formal properties (lock-freedom) 
>
> EventCount (aka "condvar for lock-free algos") has a fast common paths 
> and minimizes contention as much as possible. Due to 
> non-uniform/bursty work blocking/unblocking threads all the time 
> actually turned out to be one of the most critical parts of scheduling 
> in TF. 
>

Gotta love the eventcount. Fwiw, I remember way back when I created one in 
2005:

https://groups.google.com/d/topic/comp.programming.threads/qoxirQbbs4A/discussion

Actually, iirc, Joe Seigh created a very interesting bakery style rw 
spinlock, need to find it. Iirc, it can be outfitted with an eventcount to 
remove the spins.

 

>
> RunQueue (work-stealing deque) allows all operations from both sides 
> as work can be submitted from external threads in TF. Owner ops are 
> lock-free, remote use mutex. This part was somewhat unique as compared 
> to similar systems. Getting Size right is tricky! 
>
> ThreadPool (scheduler itself) is quite standard design based on 
> distributed runqueues. Though you can find some interesting tricks wrt 
> stealing order there. Also steal partitions (not done by me). Spinning 
> logic required quite some tuning to balance between latency and wasted 
> CPU 
>

Nice! Btw, did you ever find a use for work requesting?

Something like:

https://groups.google.com/d/topic/comp.programming.threads/YBnjd-Sqc-w/discussion
 

;^)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"Scalable Synchronization Algorithms" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to lock-free+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/lock-free/60347044-24b0-4f92-9431-80318f6034a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to