Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-04-10 Thread Changwei Ge
Hi Jun,

Thanks for your patch.

I just applied your patch into my tree and triggered ocfs2-test.
Unfortunately, the very first case fails in making fs since bio can't 
accommodate more than 16 vecs.

Of course this is not introduced by your patch. You patch just makes this 
hidden issue visible.

I just want to remind if this patch is applied. The cluster scale can't exceed 
16 nodes.

And I will try to post a patch to fix it.

Attach log:

Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329330] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 0, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329331] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 1, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329332] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 2, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329333] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 3, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329334] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 4, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329335] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 5, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329336] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 6, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329337] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 7, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329338] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 8, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329339] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 9, vec_len = 4096, vec_start = 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329339] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 10, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329340] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 11, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329341] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 12, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329342] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 13, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329343] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 14, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329344] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 15, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329345] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:463 page 16, vec_len = 4096, vec_start 
= 0
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329346] 
(mkfs.ocfs2,27479,2):o2hb_setup_one_bio:471 ERROR: Adding page[16] to bio 
failed, page ea0002d7ed40, len 0, vec_len 4096, vec_start 0, bi_sector 8192
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329357] 
(mkfs.ocfs2,27479,2):o2hb_read_slots:500 ERROR: status = -5
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329361] 
(mkfs.ocfs2,27479,2):o2hb_populate_slot_data:1911 ERROR: status = -5
Apr 11 08:37:02 cvknode-ocfs2test-e0501-1 kernel: [  942.329364] 
(mkfs.ocfs2,27479,2):o2hb_region_dev_write:2012 ERROR: status = -5


On 2018/3/28 11:52, piaojun wrote:
> We need check len for bio_add_page() to make sure the bio has been set up
> correctly, otherwise we may submit incorrect data to device.
> 
> Signed-off-by: Jun Piao 
> Reviewed-by: Yiwen Jiang 
> ---
>   fs/ocfs2/cluster/heartbeat.c | 11 ++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index ea8c551..43ad79f 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region 
> *reg,
>current_page, vec_len, vec_start);
> 
>   len = bio_add_page(bio, page, vec_len, vec_start);
> - if (len != vec_len) break;
> + if (len != vec_len) {
> + mlog(ML_ERROR, "Adding page[%d] to bio failed, "
> +  "page %p, len %d, vec_len %u, vec_start %u, "
> +  "bi_sector %llu\n", current_page, page, len,
> +  vec_len, vec_start,
> +  (unsigned long long)bio->bi_iter.bi_sector);
> + bio_put(bio);
> + bio = ERR_PTR(-EFAULT);
> + return bio;
> + }
> 
>   cs += vec_len / (PAGE_SIZE/spp);
> 

Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-03-28 Thread piaojun
Hi Changwei and Joseph,

EIO sounds more reasonable, thanks a lot for your suggestions, and I will
send patch v2 later.

thanks,
Jun

On 2018/3/29 9:09, Changwei Ge wrote:
> Hi Jun,
> 
> On 2018/3/28 17:51, Joseph Qi wrote:
>>
>>
>> On 18/3/28 15:02, piaojun wrote:
>>> Hi Joseph,
>>>
>>> On 2018/3/28 12:58, Joseph Qi wrote:


 On 18/3/28 11:50, piaojun wrote:
> We need check len for bio_add_page() to make sure the bio has been set up
> correctly, otherwise we may submit incorrect data to device.
>
> Signed-off-by: Jun Piao 
> Reviewed-by: Yiwen Jiang 
> ---
>   fs/ocfs2/cluster/heartbeat.c | 11 ++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index ea8c551..43ad79f 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct 
> o2hb_region *reg,
>current_page, vec_len, vec_start);
>
>   len = bio_add_page(bio, page, vec_len, vec_start);
> - if (len != vec_len) break;
> + if (len != vec_len) {
> + mlog(ML_ERROR, "Adding page[%d] to bio failed, "
> +  "page %p, len %d, vec_len %u, vec_start %u, "
> +  "bi_sector %llu\n", current_page, page, len,
> +  vec_len, vec_start,
> +  (unsigned long long)bio->bi_iter.bi_sector);
> + bio_put(bio);
> + bio = ERR_PTR(-EFAULT);

 IMO, EFAULT is not an appropriate error code here.
 If __bio_add_page returns 0, some are caused by bio checking failed.
 Also I've noticed that several other callers just use ENOMEM, so I think
 EINVAL or ENOMEM may be better.
>>>
>>> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
>>> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
>>> the same kernel source.
>>>
>>
>> Oops... Yes, I was looking an old kernel...
>> EIO sounds reasonable, but I don't know why EFAULT since it means "Bad 
>> address".
> 
> I agree with Joseph that EFAULT seems unreasonable for this exception cached.
> But your trick looks good to me.
> After applying a more appropriate error number, please feel free to add my:
> Reviewed-by: Changwei Ge 
> 
> Thanks,
> Changwei
> 
> 
>>
>> Thanks,
>> Joseph
>>
>> ___
>> Ocfs2-devel mailing list
>> Ocfs2-devel@oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
> .
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-03-28 Thread Changwei Ge
Hi Jun,

On 2018/3/28 17:51, Joseph Qi wrote:
> 
> 
> On 18/3/28 15:02, piaojun wrote:
>> Hi Joseph,
>>
>> On 2018/3/28 12:58, Joseph Qi wrote:
>>>
>>>
>>> On 18/3/28 11:50, piaojun wrote:
 We need check len for bio_add_page() to make sure the bio has been set up
 correctly, otherwise we may submit incorrect data to device.

 Signed-off-by: Jun Piao 
 Reviewed-by: Yiwen Jiang 
 ---
   fs/ocfs2/cluster/heartbeat.c | 11 ++-
   1 file changed, 10 insertions(+), 1 deletion(-)

 diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
 index ea8c551..43ad79f 100644
 --- a/fs/ocfs2/cluster/heartbeat.c
 +++ b/fs/ocfs2/cluster/heartbeat.c
 @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct 
 o2hb_region *reg,
 current_page, vec_len, vec_start);

len = bio_add_page(bio, page, vec_len, vec_start);
 -  if (len != vec_len) break;
 +  if (len != vec_len) {
 +  mlog(ML_ERROR, "Adding page[%d] to bio failed, "
 +   "page %p, len %d, vec_len %u, vec_start %u, "
 +   "bi_sector %llu\n", current_page, page, len,
 +   vec_len, vec_start,
 +   (unsigned long long)bio->bi_iter.bi_sector);
 +  bio_put(bio);
 +  bio = ERR_PTR(-EFAULT);
>>>
>>> IMO, EFAULT is not an appropriate error code here.
>>> If __bio_add_page returns 0, some are caused by bio checking failed.
>>> Also I've noticed that several other callers just use ENOMEM, so I think
>>> EINVAL or ENOMEM may be better.
>>
>> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
>> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
>> the same kernel source.
>>
> 
> Oops... Yes, I was looking an old kernel...
> EIO sounds reasonable, but I don't know why EFAULT since it means "Bad 
> address".

I agree with Joseph that EFAULT seems unreasonable for this exception cached.
But your trick looks good to me.
After applying a more appropriate error number, please feel free to add my:
Reviewed-by: Changwei Ge 

Thanks,
Changwei


> 
> Thanks,
> Joseph
> 
> ___
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-03-28 Thread Joseph Qi


On 18/3/28 15:02, piaojun wrote:
> Hi Joseph,
> 
> On 2018/3/28 12:58, Joseph Qi wrote:
>>
>>
>> On 18/3/28 11:50, piaojun wrote:
>>> We need check len for bio_add_page() to make sure the bio has been set up
>>> correctly, otherwise we may submit incorrect data to device.
>>>
>>> Signed-off-by: Jun Piao 
>>> Reviewed-by: Yiwen Jiang 
>>> ---
>>>  fs/ocfs2/cluster/heartbeat.c | 11 ++-
>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
>>> index ea8c551..43ad79f 100644
>>> --- a/fs/ocfs2/cluster/heartbeat.c
>>> +++ b/fs/ocfs2/cluster/heartbeat.c
>>> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct 
>>> o2hb_region *reg,
>>>  current_page, vec_len, vec_start);
>>>
>>> len = bio_add_page(bio, page, vec_len, vec_start);
>>> -   if (len != vec_len) break;
>>> +   if (len != vec_len) {
>>> +   mlog(ML_ERROR, "Adding page[%d] to bio failed, "
>>> +"page %p, len %d, vec_len %u, vec_start %u, "
>>> +"bi_sector %llu\n", current_page, page, len,
>>> +vec_len, vec_start,
>>> +(unsigned long long)bio->bi_iter.bi_sector);
>>> +   bio_put(bio);
>>> +   bio = ERR_PTR(-EFAULT);
>>
>> IMO, EFAULT is not an appropriate error code here.
>> If __bio_add_page returns 0, some are caused by bio checking failed.
>> Also I've noticed that several other callers just use ENOMEM, so I think
>> EINVAL or ENOMEM may be better.
> 
> __bio_add_page has been deleted in patch c66a14d07c13, and I notice that
> other callers always use -EFAULT or -EIO. I'm afraid we are not basing on
> the same kernel source.
> 

Oops... Yes, I was looking an old kernel...
EIO sounds reasonable, but I don't know why EFAULT since it means "Bad address".

Thanks,
Joseph

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/o2hb: check len for bio_add_page() to avoid submitting incorrect bio

2018-03-27 Thread Joseph Qi


On 18/3/28 11:50, piaojun wrote:
> We need check len for bio_add_page() to make sure the bio has been set up
> correctly, otherwise we may submit incorrect data to device.
> 
> Signed-off-by: Jun Piao 
> Reviewed-by: Yiwen Jiang 
> ---
>  fs/ocfs2/cluster/heartbeat.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
> index ea8c551..43ad79f 100644
> --- a/fs/ocfs2/cluster/heartbeat.c
> +++ b/fs/ocfs2/cluster/heartbeat.c
> @@ -570,7 +570,16 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region 
> *reg,
>current_page, vec_len, vec_start);
> 
>   len = bio_add_page(bio, page, vec_len, vec_start);
> - if (len != vec_len) break;
> + if (len != vec_len) {
> + mlog(ML_ERROR, "Adding page[%d] to bio failed, "
> +  "page %p, len %d, vec_len %u, vec_start %u, "
> +  "bi_sector %llu\n", current_page, page, len,
> +  vec_len, vec_start,
> +  (unsigned long long)bio->bi_iter.bi_sector);
> + bio_put(bio);
> + bio = ERR_PTR(-EFAULT);

IMO, EFAULT is not an appropriate error code here.
If __bio_add_page returns 0, some are caused by bio checking failed.
Also I've noticed that several other callers just use ENOMEM, so I think
EINVAL or ENOMEM may be better.

Thanks,
Joseph

> + return bio;
> + }
> 
>   cs += vec_len / (PAGE_SIZE/spp);
>   vec_start = 0;
> 

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel