rrw_enter_read_impl() should be static. The 'prio' argument to it should be boolean_t (and the values passed should be B_TRUE, B_FALSE).
Perhaps dmu_objset_find_dp_cb() should have a comment explaining why it needs to use the _prio() routine. Otherwise, looks good. --matt On Fri, Jun 12, 2015 at 9:55 PM, Arne Jansen <[email protected]> wrote: > On 13/06/15 06:47, Arne Jansen wrote: > >> On 13/06/15 04:18, Matthew Ahrens wrote: >> >>> >>> >>> On Thu, Jun 4, 2015 at 1:31 AM, Arne Jansen <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> On 06/03/2015 10:40 AM, Arne Jansen wrote: >>> > Unfortunately we just discovered a deadlock with this code: >>> > >>> > When dmu_objset_find_dp gets called with a read lock held, it fans >>> > out the work to the task queue. Each task in turn acquires its own >>> > read lock before calling the callback. If during this process >>> anyone >>> > tries to a acquire a write lock, it will stall all read lock >>> requests. >>> > Thus the tasks will never finish, the read lock of the caller will >>> > never get freed and the write lock never acquired. deadlock. >>> > >>> > I'm still thinking about how to fix this. Probably the best way >>> would >>> > be that the caller doesn't hold the read lock in the first >>> place, but >>> > I'm not sure if that's feasible for all callers. >>> > >>> >>> I think it is not. My current plan for fixing this is basically to >>> implement >>> the plan sketched out in the first comment of dmu_objset_find_dp: >>> >>> * In the future it might be possible to get some magic into >>> * dsl_pool_config_held in a way that it returns true for >>> * the worker threads so that a single lock held from this >>> * thread suffices. >>> >>> So instead of getting the lock in dmu_objset_find_dp_cb with >>> dsl_pool_config_enter, I'd call a to-be-implemented function >>> dsl_pool_config_inherit, that marks the thread as a co-owner of >>> the lock >>> in the thread's tsd. dsl_pool_config_held can than check for this >>> co-ownership. >>> >>> This would also make it possible to parallelize the write lock case, >>> although >>> this would need extra testing. >>> >>> What do you think? >>> >>> >>> I think we need to fix this sooner rather than later, either along the >>> lines of what you described, or by reverting the change. >>> >>> As we discussed on IRC, I think it would be cleanest to add this >>> lock-inheriting logic to the rrw code. But in the interest of getting >>> it fixed expeditiously, I'm open to whatever code you have. >>> >> >> I took the chicken way out and implemented the priority read_lock >> you proposed, as it is the least intrusive patch and minimizes the >> need for testing. It is currently under test here. I was going to >> send it out on monday, but can also do so later this day for a >> quick review. >> > > http://cr.illumos.org/~webrev/sensille/5981_deadlock/ > > >> -Arne >> >> >>> --matt >>> *illumos-zfs* | Archives >>> <https://www.listbox.com/member/archive/182191/=now> >>> <https://www.listbox.com/member/archive/rss/182191/23449240-486d00d3> | >>> Modify >>> <https://www.listbox.com/member/?&> >>> Your Subscription [Powered by Listbox] <http://www.listbox.com> >>> >>> >> >> >> ------------------------------------------- >> illumos-zfs >> Archives: https://www.listbox.com/member/archive/182191/=now >> RSS Feed: >> https://www.listbox.com/member/archive/rss/182191/23449240-486d00d3 >> Modify Your Subscription: >> https://www.listbox.com/member/?& >> >> Powered by Listbox: http://www.listbox.com >> > > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: > https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460 > Modify Your Subscription: > https://www.listbox.com/member/?member_id=21635000&id_secret=21635000-73dc201a > Powered by Listbox: http://www.listbox.com >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
