On Tue, Aug 29, 2017 at 05:52:05PM +0200, Oleg Nesterov wrote: > The problem is that start_flush_work() does not do acquire/release > unconditionally, it does this only if it is going to wait, and I am not > sure this is right...
Right, I think you're right, we can move it earlier, once we have the pwq. > Plus process_one_work() does lock_map_acquire_read(), I don't really > understand this too. Right, so aside from recursive-reads being broken, I think that was a mistake. > > And the analogous: > > > > flush_workqueue(wq) process_one_work(wq, work) > > A(wq) A(wq) > > R(wq) work->func(work); > > R(wq) > > > > > > The thing I puzzled over was flush_work() (really start_flush_work()) > > doing: > > > > if (pwq->wq->saved_max_active == 1 || pwq->wq->rescuer) > > lock_map_acquire(&pwq->wq->lockdep_map); > > else > > lock_map_acquire_read(&pwq->wq->lockdep_map); > > lock_map_release(&pwq->wq->lockdep_map); > > > > Why does flush_work() care about the wq->lockdep_map? > > > > The answer is because, for single-threaded workqueues, doing > > flush_work() from a work is a potential deadlock: > > Yes, but the simple answer is that flush_work() doesn't really differ > from flush_workqueue() in this respect? Right, and I think that the new code (aside from maybe placing it earlier) does that. If single-threaded we use wq->lockdep_map, otherwise we (also) use work->lockdep_map. > If nothing else, if some WORK is the last queued work on WQ, then > flush_work(WORK) is the same thing as flush_workqueuw(WQ), more or less. > Again, I am talking about single-threaded workqueues. Right, so single-threaded workqueues are special and are what we need this extra bit for, but only for single-threaded. > > workqueue-thread: > > > > work-n: > > flush_work(work-n+1); > > > > work-n+1: > > > > > > Will not be going anywhere fast.. > > Or another example, > > lock(LOCK); > flush_work(WORK); > unlock(LOCK); > > workqueue-thread: > another_pending_work: > LOCK(LOCK); > UNLOCK(LOCK); > > WORK: > > In this case we do not care about WORK->lockdep_map, but > taking the wq->lockdep_map from flush_work() (if single-threaded) allows > to report the deadlock. Right. And the new code does so.