Re: [Ocfs2-devel] [PATCH 5/5] ocfs2: Implement delayed dropping of last dquot reference
On Mon 20-01-14 22:47:43, Srinivas Eeda wrote: On 01/20/2014 07:31 AM, Goldwyn Rodrigues wrote: On 01/16/2014 04:58 PM, Jan Kara wrote: On Thu 16-01-14 23:28:49, Jan Kara wrote: We cannot drop last dquot reference from downconvert thread as that creates the following deadlock: NODE 1 NODE2 holds dentry lock for 'foo' holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE dquot_initialize(bar) ocfs2_dquot_acquire() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) ... downconvert thread (triggered from another node or a different process from NODE2) ocfs2_dentry_post_unlock() ... iput(foo) ocfs2_evict_inode(foo) ocfs2_clear_inode(foo) dquot_drop(inode) ... ocfs2_dquot_release() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) - blocks finds we need more space in quota file ... ocfs2_extend_no_holes() ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE) - deadlocks waiting for downconvert thread We solve the problem by postponing dropping of the last dquot reference to a workqueue if it happens from the downconvert thread. Hum, now looking again into ocfs2_clear_inode() there are more problems than I originally thought. Look for example at ocfs2_mark_lockres_freeing(). That will block on rw/inode/open lock if there is downconvert pending waiting for that downconvert to finish. However that never happens when ocfs2_clear_inode() is called from the downconvert thread. So we are back to square one - I don't see a way how to fix these deadlocks without postponing dropping of inode reference to a workqueue :(. Since the reason of the unlink performance is the delay in calling ocfs2_open_unlock(), and the ocfs2_mark_lockres_freeing() comes after ocfs2_open_unlock(): can we move the call to ocfs2_open_unlock() to ocfs2_evict_inode() and then perform ocfs2_clear_inode() in a deferred way? once ocfs2_evict_inode is returned, vfs would destroy the inode, so I think we should do the cleanup before that and hence we cannot differ ocfs2_clear_inode from inside ocfs2_evict_inode Yes. May be we should queue ocfs2_blocking_ast call itself to down convert thread. That way it doesn't prevent down convert thread from clearing the inode. Once when down convert thread comes to processes the queued ast/bast and finds lockres cleared it can just return. Yes, I was thinking about some trick like this myself. But I don't really know all the constraints in that area so I'm not sure what will work and what not and I didn't have time to look deep into that yet. Honza Signed-off-by: Jan Kara j...@suse.cz --- fs/ocfs2/ocfs2.h| 5 + fs/ocfs2/quota.h| 2 ++ fs/ocfs2/quota_global.c | 35 +++ fs/ocfs2/super.c| 8 4 files changed, 50 insertions(+) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index ca81f6b49236..f6134345fe42 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -30,6 +30,7 @@ #include linux/sched.h #include linux/wait.h #include linux/list.h +#include linux/llist.h #include linux/rbtree.h #include linux/workqueue.h #include linux/kref.h @@ -410,6 +411,10 @@ struct ocfs2_super struct list_head blocked_lock_list; unsigned long blocked_lock_count; +/* List of dquot structures to drop last reference to */ +struct llist_head dquot_drop_list; +struct work_struct dquot_drop_work; + wait_queue_head_tosb_mount_event; /* Truncate log info */ diff --git a/fs/ocfs2/quota.h b/fs/ocfs2/quota.h index d5ab56cbe5c5..f266d67df3c6 100644 --- a/fs/ocfs2/quota.h +++ b/fs/ocfs2/quota.h @@ -28,6 +28,7 @@ struct ocfs2_dquot { unsigned int dq_use_count;/* Number of nodes having reference to this entry in global quota file */ s64 dq_origspace;/* Last globally synced space usage */ s64 dq_originodes;/* Last globally synced inode usage */ +struct llist_node list;/* Member of list of dquots to drop */ }; /* Description of one chunk to recover in memory */ @@ -110,6 +111,7 @@ int ocfs2_read_quota_phys_block(struct inode *inode, u64 p_block, int ocfs2_create_local_dquot(struct dquot *dquot); int ocfs2_local_release_dquot(handle_t *handle, struct dquot *dquot); int ocfs2_local_write_dquot(struct dquot *dquot); +void ocfs2_drop_dquot_refs(struct work_struct *work); extern const struct dquot_operations ocfs2_quota_operations; extern struct quota_format_type
Re: [Ocfs2-devel] [PATCH 5/5] ocfs2: Implement delayed dropping of last dquot reference
On 01/16/2014 04:58 PM, Jan Kara wrote: On Thu 16-01-14 23:28:49, Jan Kara wrote: We cannot drop last dquot reference from downconvert thread as that creates the following deadlock: NODE 1 NODE2 holds dentry lock for 'foo' holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE dquot_initialize(bar) ocfs2_dquot_acquire() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) ... downconvert thread (triggered from another node or a different process from NODE2) ocfs2_dentry_post_unlock() ... iput(foo) ocfs2_evict_inode(foo) ocfs2_clear_inode(foo) dquot_drop(inode) ... ocfs2_dquot_release() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) - blocks finds we need more space in quota file ... ocfs2_extend_no_holes() ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE) - deadlocks waiting for downconvert thread We solve the problem by postponing dropping of the last dquot reference to a workqueue if it happens from the downconvert thread. Hum, now looking again into ocfs2_clear_inode() there are more problems than I originally thought. Look for example at ocfs2_mark_lockres_freeing(). That will block on rw/inode/open lock if there is downconvert pending waiting for that downconvert to finish. However that never happens when ocfs2_clear_inode() is called from the downconvert thread. So we are back to square one - I don't see a way how to fix these deadlocks without postponing dropping of inode reference to a workqueue :(. Since the reason of the unlink performance is the delay in calling ocfs2_open_unlock(), and the ocfs2_mark_lockres_freeing() comes after ocfs2_open_unlock(): can we move the call to ocfs2_open_unlock() to ocfs2_evict_inode() and then perform ocfs2_clear_inode() in a deferred way? Honza Signed-off-by: Jan Kara j...@suse.cz --- fs/ocfs2/ocfs2.h| 5 + fs/ocfs2/quota.h| 2 ++ fs/ocfs2/quota_global.c | 35 +++ fs/ocfs2/super.c| 8 4 files changed, 50 insertions(+) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index ca81f6b49236..f6134345fe42 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -30,6 +30,7 @@ #include linux/sched.h #include linux/wait.h #include linux/list.h +#include linux/llist.h #include linux/rbtree.h #include linux/workqueue.h #include linux/kref.h @@ -410,6 +411,10 @@ struct ocfs2_super struct list_head blocked_lock_list; unsigned long blocked_lock_count; +/* List of dquot structures to drop last reference to */ +struct llist_head dquot_drop_list; +struct work_struct dquot_drop_work; + wait_queue_head_t osb_mount_event; /* Truncate log info */ diff --git a/fs/ocfs2/quota.h b/fs/ocfs2/quota.h index d5ab56cbe5c5..f266d67df3c6 100644 --- a/fs/ocfs2/quota.h +++ b/fs/ocfs2/quota.h @@ -28,6 +28,7 @@ struct ocfs2_dquot { unsigned int dq_use_count; /* Number of nodes having reference to this entry in global quota file */ s64 dq_origspace; /* Last globally synced space usage */ s64 dq_originodes; /* Last globally synced inode usage */ +struct llist_node list; /* Member of list of dquots to drop */ }; /* Description of one chunk to recover in memory */ @@ -110,6 +111,7 @@ int ocfs2_read_quota_phys_block(struct inode *inode, u64 p_block, int ocfs2_create_local_dquot(struct dquot *dquot); int ocfs2_local_release_dquot(handle_t *handle, struct dquot *dquot); int ocfs2_local_write_dquot(struct dquot *dquot); +void ocfs2_drop_dquot_refs(struct work_struct *work); extern const struct dquot_operations ocfs2_quota_operations; extern struct quota_format_type ocfs2_quota_format; diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c index aaa50611ec66..7921e209c64b 100644 --- a/fs/ocfs2/quota_global.c +++ b/fs/ocfs2/quota_global.c @@ -10,6 +10,7 @@ #include linux/jiffies.h #include linux/writeback.h #include linux/workqueue.h +#include linux/llist.h #include cluster/masklog.h @@ -679,6 +680,27 @@ static int ocfs2_calc_qdel_credits(struct super_block *sb, int type) OCFS2_INODE_UPDATE_CREDITS; } +void ocfs2_drop_dquot_refs(struct work_struct *work) +{ +struct ocfs2_super *osb = container_of(work, struct
Re: [Ocfs2-devel] [PATCH 5/5] ocfs2: Implement delayed dropping of last dquot reference
On 01/20/2014 07:31 AM, Goldwyn Rodrigues wrote: On 01/16/2014 04:58 PM, Jan Kara wrote: On Thu 16-01-14 23:28:49, Jan Kara wrote: We cannot drop last dquot reference from downconvert thread as that creates the following deadlock: NODE 1 NODE2 holds dentry lock for 'foo' holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE dquot_initialize(bar) ocfs2_dquot_acquire() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) ... downconvert thread (triggered from another node or a different process from NODE2) ocfs2_dentry_post_unlock() ... iput(foo) ocfs2_evict_inode(foo) ocfs2_clear_inode(foo) dquot_drop(inode) ... ocfs2_dquot_release() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) - blocks finds we need more space in quota file ... ocfs2_extend_no_holes() ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE) - deadlocks waiting for downconvert thread We solve the problem by postponing dropping of the last dquot reference to a workqueue if it happens from the downconvert thread. Hum, now looking again into ocfs2_clear_inode() there are more problems than I originally thought. Look for example at ocfs2_mark_lockres_freeing(). That will block on rw/inode/open lock if there is downconvert pending waiting for that downconvert to finish. However that never happens when ocfs2_clear_inode() is called from the downconvert thread. So we are back to square one - I don't see a way how to fix these deadlocks without postponing dropping of inode reference to a workqueue :(. Since the reason of the unlink performance is the delay in calling ocfs2_open_unlock(), and the ocfs2_mark_lockres_freeing() comes after ocfs2_open_unlock(): can we move the call to ocfs2_open_unlock() to ocfs2_evict_inode() and then perform ocfs2_clear_inode() in a deferred way? once ocfs2_evict_inode is returned, vfs would destroy the inode, so I think we should do the cleanup before that and hence we cannot differ ocfs2_clear_inode from inside ocfs2_evict_inode May be we should queue ocfs2_blocking_ast call itself to down convert thread. That way it doesn't prevent down convert thread from clearing the inode. Once when down convert thread comes to processes the queued ast/bast and finds lockres cleared it can just return. Honza Signed-off-by: Jan Kara j...@suse.cz --- fs/ocfs2/ocfs2.h| 5 + fs/ocfs2/quota.h| 2 ++ fs/ocfs2/quota_global.c | 35 +++ fs/ocfs2/super.c| 8 4 files changed, 50 insertions(+) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index ca81f6b49236..f6134345fe42 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -30,6 +30,7 @@ #include linux/sched.h #include linux/wait.h #include linux/list.h +#include linux/llist.h #include linux/rbtree.h #include linux/workqueue.h #include linux/kref.h @@ -410,6 +411,10 @@ struct ocfs2_super struct list_head blocked_lock_list; unsigned long blocked_lock_count; +/* List of dquot structures to drop last reference to */ +struct llist_head dquot_drop_list; +struct work_struct dquot_drop_work; + wait_queue_head_tosb_mount_event; /* Truncate log info */ diff --git a/fs/ocfs2/quota.h b/fs/ocfs2/quota.h index d5ab56cbe5c5..f266d67df3c6 100644 --- a/fs/ocfs2/quota.h +++ b/fs/ocfs2/quota.h @@ -28,6 +28,7 @@ struct ocfs2_dquot { unsigned int dq_use_count;/* Number of nodes having reference to this entry in global quota file */ s64 dq_origspace;/* Last globally synced space usage */ s64 dq_originodes;/* Last globally synced inode usage */ +struct llist_node list;/* Member of list of dquots to drop */ }; /* Description of one chunk to recover in memory */ @@ -110,6 +111,7 @@ int ocfs2_read_quota_phys_block(struct inode *inode, u64 p_block, int ocfs2_create_local_dquot(struct dquot *dquot); int ocfs2_local_release_dquot(handle_t *handle, struct dquot *dquot); int ocfs2_local_write_dquot(struct dquot *dquot); +void ocfs2_drop_dquot_refs(struct work_struct *work); extern const struct dquot_operations ocfs2_quota_operations; extern struct quota_format_type ocfs2_quota_format; diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c index aaa50611ec66..7921e209c64b 100644 --- a/fs/ocfs2/quota_global.c +++ b/fs/ocfs2/quota_global.c @@ -10,6 +10,7 @@ #include linux/jiffies.h #include linux/writeback.h #include linux/workqueue.h
[Ocfs2-devel] [PATCH 5/5] ocfs2: Implement delayed dropping of last dquot reference
We cannot drop last dquot reference from downconvert thread as that creates the following deadlock: NODE 1 NODE2 holds dentry lock for 'foo' holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE dquot_initialize(bar) ocfs2_dquot_acquire() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) ... downconvert thread (triggered from another node or a different process from NODE2) ocfs2_dentry_post_unlock() ... iput(foo) ocfs2_evict_inode(foo) ocfs2_clear_inode(foo) dquot_drop(inode) ... ocfs2_dquot_release() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) - blocks finds we need more space in quota file ... ocfs2_extend_no_holes() ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE) - deadlocks waiting for downconvert thread We solve the problem by postponing dropping of the last dquot reference to a workqueue if it happens from the downconvert thread. Signed-off-by: Jan Kara j...@suse.cz --- fs/ocfs2/ocfs2.h| 5 + fs/ocfs2/quota.h| 2 ++ fs/ocfs2/quota_global.c | 35 +++ fs/ocfs2/super.c| 8 4 files changed, 50 insertions(+) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index ca81f6b49236..f6134345fe42 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -30,6 +30,7 @@ #include linux/sched.h #include linux/wait.h #include linux/list.h +#include linux/llist.h #include linux/rbtree.h #include linux/workqueue.h #include linux/kref.h @@ -410,6 +411,10 @@ struct ocfs2_super struct list_head blocked_lock_list; unsigned long blocked_lock_count; + /* List of dquot structures to drop last reference to */ + struct llist_head dquot_drop_list; + struct work_struct dquot_drop_work; + wait_queue_head_t osb_mount_event; /* Truncate log info */ diff --git a/fs/ocfs2/quota.h b/fs/ocfs2/quota.h index d5ab56cbe5c5..f266d67df3c6 100644 --- a/fs/ocfs2/quota.h +++ b/fs/ocfs2/quota.h @@ -28,6 +28,7 @@ struct ocfs2_dquot { unsigned int dq_use_count; /* Number of nodes having reference to this entry in global quota file */ s64 dq_origspace; /* Last globally synced space usage */ s64 dq_originodes; /* Last globally synced inode usage */ + struct llist_node list; /* Member of list of dquots to drop */ }; /* Description of one chunk to recover in memory */ @@ -110,6 +111,7 @@ int ocfs2_read_quota_phys_block(struct inode *inode, u64 p_block, int ocfs2_create_local_dquot(struct dquot *dquot); int ocfs2_local_release_dquot(handle_t *handle, struct dquot *dquot); int ocfs2_local_write_dquot(struct dquot *dquot); +void ocfs2_drop_dquot_refs(struct work_struct *work); extern const struct dquot_operations ocfs2_quota_operations; extern struct quota_format_type ocfs2_quota_format; diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c index aaa50611ec66..7921e209c64b 100644 --- a/fs/ocfs2/quota_global.c +++ b/fs/ocfs2/quota_global.c @@ -10,6 +10,7 @@ #include linux/jiffies.h #include linux/writeback.h #include linux/workqueue.h +#include linux/llist.h #include cluster/masklog.h @@ -679,6 +680,27 @@ static int ocfs2_calc_qdel_credits(struct super_block *sb, int type) OCFS2_INODE_UPDATE_CREDITS; } +void ocfs2_drop_dquot_refs(struct work_struct *work) +{ + struct ocfs2_super *osb = container_of(work, struct ocfs2_super, + dquot_drop_work); + struct llist_node *list; + struct ocfs2_dquot *odquot, *next_odquot; + + list = llist_del_all(osb-dquot_drop_list); + llist_for_each_entry_safe(odquot, next_odquot, list, list) { + /* Drop the reference we acquired in ocfs2_dquot_release() */ + dqput(odquot-dq_dquot); + } +} + +/* + * Called when the last reference to dquot is dropped. If we are called from + * downconvert thread, we cannot do all the handling here because grabbing + * quota lock could deadlock (the node holding the quota lock could need some + * other cluster lock to proceed but with blocked downconvert thread we cannot + * release any lock). + */ static int ocfs2_release_dquot(struct dquot *dquot) { handle_t *handle; @@ -694,6 +716,19 @@ static int ocfs2_release_dquot(struct dquot *dquot) /* Check whether we are not racing with some other dqget() */ if (atomic_read(dquot-dq_count) 1)
Re: [Ocfs2-devel] [PATCH 5/5] ocfs2: Implement delayed dropping of last dquot reference
On Thu 16-01-14 23:28:49, Jan Kara wrote: We cannot drop last dquot reference from downconvert thread as that creates the following deadlock: NODE 1 NODE2 holds dentry lock for 'foo' holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE dquot_initialize(bar) ocfs2_dquot_acquire() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) ... downconvert thread (triggered from another node or a different process from NODE2) ocfs2_dentry_post_unlock() ... iput(foo) ocfs2_evict_inode(foo) ocfs2_clear_inode(foo) dquot_drop(inode) ... ocfs2_dquot_release() ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE) - blocks finds we need more space in quota file ... ocfs2_extend_no_holes() ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE) - deadlocks waiting for downconvert thread We solve the problem by postponing dropping of the last dquot reference to a workqueue if it happens from the downconvert thread. Hum, now looking again into ocfs2_clear_inode() there are more problems than I originally thought. Look for example at ocfs2_mark_lockres_freeing(). That will block on rw/inode/open lock if there is downconvert pending waiting for that downconvert to finish. However that never happens when ocfs2_clear_inode() is called from the downconvert thread. So we are back to square one - I don't see a way how to fix these deadlocks without postponing dropping of inode reference to a workqueue :(. Honza Signed-off-by: Jan Kara j...@suse.cz --- fs/ocfs2/ocfs2.h| 5 + fs/ocfs2/quota.h| 2 ++ fs/ocfs2/quota_global.c | 35 +++ fs/ocfs2/super.c| 8 4 files changed, 50 insertions(+) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index ca81f6b49236..f6134345fe42 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -30,6 +30,7 @@ #include linux/sched.h #include linux/wait.h #include linux/list.h +#include linux/llist.h #include linux/rbtree.h #include linux/workqueue.h #include linux/kref.h @@ -410,6 +411,10 @@ struct ocfs2_super struct list_head blocked_lock_list; unsigned long blocked_lock_count; + /* List of dquot structures to drop last reference to */ + struct llist_head dquot_drop_list; + struct work_struct dquot_drop_work; + wait_queue_head_t osb_mount_event; /* Truncate log info */ diff --git a/fs/ocfs2/quota.h b/fs/ocfs2/quota.h index d5ab56cbe5c5..f266d67df3c6 100644 --- a/fs/ocfs2/quota.h +++ b/fs/ocfs2/quota.h @@ -28,6 +28,7 @@ struct ocfs2_dquot { unsigned int dq_use_count; /* Number of nodes having reference to this entry in global quota file */ s64 dq_origspace; /* Last globally synced space usage */ s64 dq_originodes; /* Last globally synced inode usage */ + struct llist_node list; /* Member of list of dquots to drop */ }; /* Description of one chunk to recover in memory */ @@ -110,6 +111,7 @@ int ocfs2_read_quota_phys_block(struct inode *inode, u64 p_block, int ocfs2_create_local_dquot(struct dquot *dquot); int ocfs2_local_release_dquot(handle_t *handle, struct dquot *dquot); int ocfs2_local_write_dquot(struct dquot *dquot); +void ocfs2_drop_dquot_refs(struct work_struct *work); extern const struct dquot_operations ocfs2_quota_operations; extern struct quota_format_type ocfs2_quota_format; diff --git a/fs/ocfs2/quota_global.c b/fs/ocfs2/quota_global.c index aaa50611ec66..7921e209c64b 100644 --- a/fs/ocfs2/quota_global.c +++ b/fs/ocfs2/quota_global.c @@ -10,6 +10,7 @@ #include linux/jiffies.h #include linux/writeback.h #include linux/workqueue.h +#include linux/llist.h #include cluster/masklog.h @@ -679,6 +680,27 @@ static int ocfs2_calc_qdel_credits(struct super_block *sb, int type) OCFS2_INODE_UPDATE_CREDITS; } +void ocfs2_drop_dquot_refs(struct work_struct *work) +{ + struct ocfs2_super *osb = container_of(work, struct ocfs2_super, +dquot_drop_work); + struct llist_node *list; + struct ocfs2_dquot *odquot, *next_odquot; + + list = llist_del_all(osb-dquot_drop_list); + llist_for_each_entry_safe(odquot, next_odquot, list, list) { + /* Drop the reference we acquired in