[Ocfs2-devel] [RFC] ocfs2: Idea to make ocfs2_search_chain high efficiency
In ocfs2_search_chain, I found it has low efficiency while searching an available gd in some circumstances: 1) The lun has a great many gd(it reads lots of unavailable gd(free bits is zero)); 2) Not too many gd, but the available gd is scattered in the unavailable gd(fragmentation); So I have an idea to optimize the search method: 1) Use the reserved member in the ocfs2_group_desc to make an available chain list(gds in the list are all available, free bits more than zero); 2) At the beginning, the chain list is the same with origin chain list; 3) While do allocation, it searches gd in the available chain list; 4) After each allocation, if some gd's free bits is zero, Remove it from the available chain list; 5) After each reclaim, if some gd's free bits change from zero to positive, Append it to the head of the available chain list; Once started with the basics outlined above, no unavailable gd will be read. Anyone has better ideas or advices? ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [RFC] ocfs2: Idea to make ocfs2_search_chain high efficiency
Hi Norton, while the localalloc bmp is enabled the chances of a particular gd ending with zero free bits is very minimal. local alloc bmp will pick next gd once min number of free bits falls below localalloc bmp size. So next gd is picked while the current gd still has free space. Having said that, I understand the problem you are describing. Since we only move the gd with more free bits ahead of the previous unusable gd we end of rescanning unusable gds whenever the newly picked gd becomes unusable. This is inefficient and causes performance problem when fs gets fragmented I am not sure if we will be allowed to use the bg_reserved2 ... but if we are then we could use it to remember next gd we should look at. For example, if we assume there are 'N' GDs and we are currently seeing that GD10 has more free space ... so the order of GDs are GD10, GD1, GD2, ... GDN. Now if we end up using most of the space in GD10, according to current logic we re-read GD1, GD2.. GD9 and then read GD11 and pick that (assuming it has free space). But instead if we had GD10 remembers GD11 as the next one to look at ... then we can skip reading GD1..GD9. Thanks, --Srini On 08/24/2015 05:30 AM, Norton.Zhu wrote: In ocfs2_search_chain, I found it has low efficiency while searching an available gd in some circumstances: 1) The lun has a great many gd(it reads lots of unavailable gd(free bits is zero)); 2) Not too many gd, but the available gd is scattered in the unavailable gd(fragmentation); So I have an idea to optimize the search method: 1) Use the reserved member in the ocfs2_group_desc to make an available chain list(gds in the list are all available, free bits more than zero); 2) At the beginning, the chain list is the same with origin chain list; 3) While do allocation, it searches gd in the available chain list; 4) After each allocation, if some gd's free bits is zero, Remove it from the available chain list; 5) After each reclaim, if some gd's free bits change from zero to positive, Append it to the head of the available chain list; Once started with the basics outlined above, no unavailable gd will be read. Anyone has better ideas or advices? ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH] ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio
Orabug: 21612107 Use wrong return value in ocfs2_file_write_iter(). This will cause ocfs2_rw_unlock() be called both in write_iter end_io, and trigger a BUG_ON. This issue exist since commit 7da839c475894ea872ec909a5d2e83dddccff5be. Signed-off-by: Ryan Ding ryan.d...@oracle.com --- fs/ocfs2/file.c | 28 ++-- 1 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 719f7f4..33efa33 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2372,6 +2372,20 @@ relock: /* buffered aio wouldn't have proper lock coverage today */ BUG_ON(written == -EIOCBQUEUED !(iocb-ki_flags IOCB_DIRECT)); + /* +* deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io +* function pointer which is called when o_direct io completes so that +* it can unlock our rw lock. +* Unfortunately there are error cases which call end_io and others +* that don't. so we don't have to unlock the rw_lock if either an +* async dio is going to do it in the future or an end_io after an +* error has already done it. +*/ + if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { + rw_level = -1; + unaligned_dio = 0; + } + if (unlikely(written = 0)) goto no_sync; @@ -2396,20 +2410,6 @@ relock: } no_sync: - /* -* deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io -* function pointer which is called when o_direct io completes so that -* it can unlock our rw lock. -* Unfortunately there are error cases which call end_io and others -* that don't. so we don't have to unlock the rw_lock if either an -* async dio is going to do it in the future or an end_io after an -* error has already done it. -*/ - if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { - rw_level = -1; - unaligned_dio = 0; - } - if (unaligned_dio) { ocfs2_iocb_clear_unaligned_aio(iocb); mutex_unlock(OCFS2_I(inode)-ip_unaligned_aio); -- 1.7.1 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio
On 08/24/2015 03:23 PM, Ryan Ding wrote: Orabug: 21612107 Use wrong return value in ocfs2_file_write_iter(). This will cause ocfs2_rw_unlock() be called both in write_iter end_io, and trigger a BUG_ON. This issue exist since commit 7da839c475894ea872ec909a5d2e83dddccff5be. Better say: This issue is introduced by commit 7da839c47589 (ocfs2: use __generic_file_write_iter()) , or checkpatch will report a style error. Other looks good. Reviewed-by: Junxiao Bi junxiao...@oracle.com Thanks, Junxiao. Signed-off-by: Ryan Ding ryan.d...@oracle.com --- fs/ocfs2/file.c | 28 ++-- 1 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 719f7f4..33efa33 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2372,6 +2372,20 @@ relock: /* buffered aio wouldn't have proper lock coverage today */ BUG_ON(written == -EIOCBQUEUED !(iocb-ki_flags IOCB_DIRECT)); + /* + * deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io + * function pointer which is called when o_direct io completes so that + * it can unlock our rw lock. + * Unfortunately there are error cases which call end_io and others + * that don't. so we don't have to unlock the rw_lock if either an + * async dio is going to do it in the future or an end_io after an + * error has already done it. + */ + if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { + rw_level = -1; + unaligned_dio = 0; + } + if (unlikely(written = 0)) goto no_sync; @@ -2396,20 +2410,6 @@ relock: } no_sync: - /* - * deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io - * function pointer which is called when o_direct io completes so that - * it can unlock our rw lock. - * Unfortunately there are error cases which call end_io and others - * that don't. so we don't have to unlock the rw_lock if either an - * async dio is going to do it in the future or an end_io after an - * error has already done it. - */ - if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { - rw_level = -1; - unaligned_dio = 0; - } - if (unaligned_dio) { ocfs2_iocb_clear_unaligned_aio(iocb); mutex_unlock(OCFS2_I(inode)-ip_unaligned_aio); ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH] ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio
Orabug: 21612107 Use wrong return value in ocfs2_file_write_iter(). This will cause ocfs2_rw_unlock() be called both in write_iter end_io, and trigger a BUG_ON. This issue exist since commit 7da839c475894ea872ec909a5d2e83dddccff5be. Signed-off-by: Ryan Ding ryan.d...@oracle.com --- fs/ocfs2/file.c | 28 ++-- 1 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 719f7f4..33efa33 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2372,6 +2372,20 @@ relock: /* buffered aio wouldn't have proper lock coverage today */ BUG_ON(written == -EIOCBQUEUED !(iocb-ki_flags IOCB_DIRECT)); + /* +* deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io +* function pointer which is called when o_direct io completes so that +* it can unlock our rw lock. +* Unfortunately there are error cases which call end_io and others +* that don't. so we don't have to unlock the rw_lock if either an +* async dio is going to do it in the future or an end_io after an +* error has already done it. +*/ + if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { + rw_level = -1; + unaligned_dio = 0; + } + if (unlikely(written = 0)) goto no_sync; @@ -2396,20 +2410,6 @@ relock: } no_sync: - /* -* deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io -* function pointer which is called when o_direct io completes so that -* it can unlock our rw lock. -* Unfortunately there are error cases which call end_io and others -* that don't. so we don't have to unlock the rw_lock if either an -* async dio is going to do it in the future or an end_io after an -* error has already done it. -*/ - if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { - rw_level = -1; - unaligned_dio = 0; - } - if (unaligned_dio) { ocfs2_iocb_clear_unaligned_aio(iocb); mutex_unlock(OCFS2_I(inode)-ip_unaligned_aio); -- 1.7.1 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio
On Mon, 24 Aug 2015 15:39:04 +0800 Junxiao Bi junxiao...@oracle.com wrote: On 08/24/2015 03:23 PM, Ryan Ding wrote: Orabug: 21612107 Use wrong return value in ocfs2_file_write_iter(). This will cause ocfs2_rw_unlock() be called both in write_iter end_io, and trigger a BUG_ON. This issue exist since commit 7da839c475894ea872ec909a5d2e83dddccff5be. Better say: This issue is introduced by commit 7da839c47589 (ocfs2: use __generic_file_write_iter()) , or checkpatch will report a style error. Other looks good. Reviewed-by: Junxiao Bi junxiao...@oracle.com Also we've recently adopted the convention of using the Fixes: tag: Fixes: 7da839c47589 (ocfs2: use __generic_file_write_iter()) And as the patch fixes a regression we should add the Cc: sta...@vger.kernel.org tag. And we should the author of the bad patch for review (hi, Al). I've made these changes and I'll get this patch into Linus this week, for 4.2. From: Ryan Ding ryan.d...@oracle.com Subject: ocfs2: direct write will call ocfs2_rw_unlock() twice when doing aio+dio Orabug: 21612107 ocfs2_file_write_iter() is usng the wrong return value ('written'). This will cause ocfs2_rw_unlock() be called both in write_iter end_io, triggering a BUG_ON. This issue was introduced by commit 7da839c47589 (ocfs2: use __generic_file_write_iter()). Fixes: 7da839c47589 (ocfs2: use __generic_file_write_iter()) Signed-off-by: Ryan Ding ryan.d...@oracle.com Reviewed-by: Junxiao Bi junxiao...@oracle.com Cc: Al Viro v...@zeniv.linux.org.uk Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Cc: sta...@vger.kernel.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff -puN fs/ocfs2/file.c~ocfs2-direct-write-will-call-ocfs2_rw_unlock-twice-when-doing-aiodio fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-direct-write-will-call-ocfs2_rw_unlock-twice-when-doing-aiodio +++ a/fs/ocfs2/file.c @@ -2366,6 +2366,20 @@ relock: /* buffered aio wouldn't have proper lock coverage today */ BUG_ON(written == -EIOCBQUEUED !(iocb-ki_flags IOCB_DIRECT)); + /* +* deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io +* function pointer which is called when o_direct io completes so that +* it can unlock our rw lock. +* Unfortunately there are error cases which call end_io and others +* that don't. so we don't have to unlock the rw_lock if either an +* async dio is going to do it in the future or an end_io after an +* error has already done it. +*/ + if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { + rw_level = -1; + unaligned_dio = 0; + } + if (unlikely(written = 0)) goto no_sync; @@ -2390,20 +2404,6 @@ relock: } no_sync: - /* -* deep in g_f_a_w_n()-ocfs2_direct_IO we pass in a ocfs2_dio_end_io -* function pointer which is called when o_direct io completes so that -* it can unlock our rw lock. -* Unfortunately there are error cases which call end_io and others -* that don't. so we don't have to unlock the rw_lock if either an -* async dio is going to do it in the future or an end_io after an -* error has already done it. -*/ - if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) { - rw_level = -1; - unaligned_dio = 0; - } - if (unaligned_dio ocfs2_iocb_is_unaligned_aio(iocb)) { ocfs2_iocb_clear_unaligned_aio(iocb); mutex_unlock(OCFS2_I(inode)-ip_unaligned_aio); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [RFC] ocfs2: Idea to make ocfs2_search_chain high efficiency
Srini has summed it up, mostly. We shouldn't be hitting these suboptimal chains of gds unless the localalloc is empty. Hopefully that's not terribly frequent, but it can happen. Your solution does increase the complexity of the commit. Pushing a newly-available gd back on to the available list is cheap, as you're already modifing the chain_rec and the gd. But deleting a full gd from the available list requires you to find the previous entry in the list and edit it as well. The lock ordering is hopefully not terrible. Joel On Mon, Aug 24, 2015 at 09:57:26AM -0700, Srinivas Eeda wrote: Hi Norton, while the localalloc bmp is enabled the chances of a particular gd ending with zero free bits is very minimal. local alloc bmp will pick next gd once min number of free bits falls below localalloc bmp size. So next gd is picked while the current gd still has free space. Having said that, I understand the problem you are describing. Since we only move the gd with more free bits ahead of the previous unusable gd we end of rescanning unusable gds whenever the newly picked gd becomes unusable. This is inefficient and causes performance problem when fs gets fragmented I am not sure if we will be allowed to use the bg_reserved2 ... but if we are then we could use it to remember next gd we should look at. For example, if we assume there are 'N' GDs and we are currently seeing that GD10 has more free space ... so the order of GDs are GD10, GD1, GD2, ... GDN. Now if we end up using most of the space in GD10, according to current logic we re-read GD1, GD2.. GD9 and then read GD11 and pick that (assuming it has free space). But instead if we had GD10 remembers GD11 as the next one to look at ... then we can skip reading GD1..GD9. Thanks, --Srini On 08/24/2015 05:30 AM, Norton.Zhu wrote: In ocfs2_search_chain, I found it has low efficiency while searching an available gd in some circumstances: 1) The lun has a great many gd(it reads lots of unavailable gd(free bits is zero)); 2) Not too many gd, but the available gd is scattered in the unavailable gd(fragmentation); So I have an idea to optimize the search method: 1) Use the reserved member in the ocfs2_group_desc to make an available chain list(gds in the list are all available, free bits more than zero); 2) At the beginning, the chain list is the same with origin chain list; 3) While do allocation, it searches gd in the available chain list; 4) After each allocation, if some gd's free bits is zero, Remove it from the available chain list; 5) After each reclaim, if some gd's free bits change from zero to positive, Append it to the head of the available chain list; Once started with the basics outlined above, no unavailable gd will be read. Anyone has better ideas or advices? ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel -- ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel