Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

Lian, George (NSB - CN/Hangzhou) Wed, 24 Jan 2018 03:04:06 -0800

So I suppose ctime is enough to consider it whether a good iatt or not.
And why we also include ia_nlink in function gf_zero_fill_stat and 
gf_is_zero_filled_stat ?


From my investigation, if set ia_nlink to 0, if kernel read the attr with flag 
RCU, kernel will check the ia_nlink field, when do LINK operation, it will lead 
to error of “files is not exist”.

if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
             error =  -ENOENT;


Best Regards,
George

From: [email protected] 
[mailto:[email protected]] On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 24, 2018 4:15 PM
To: Lian, George (NSB - CN/Hangzhou) <[email protected]>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) <[email protected]>; 
[email protected]; Li, Deqian (NSB - CN/Hangzhou) 
<[email protected]>; Sun, Ping (NSB - CN/Hangzhou) 
<[email protected]>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

If ctime is zero, no xlator should consider it as a good iatt. The fact that 
this is happening means some xlator is not doing proper checks in code. We need 
to find what that xlator is and fix it. Internet in our new office is not 
working so I'm not able to have call today with you guys. What I would do is to 
put logs in lookup, link, fstat, stat calls to see if anyone unwound iatt with 
ia_nlink count as zero but ctime as nonzero.

On 24 Jan 2018 1:03 pm, "Lian, George (NSB - CN/Hangzhou)" 
<[email protected]<mailto:[email protected]>> wrote:
Hi,  Pranith Kumar,

Can you tell me while need set buf->ia_nlink to “0”in function 
gf_zero_fill_stat(), which API or Application will care it?
If I remove this line and also update corresponding in function 
gf_is_zero_filled_stat,
The issue seems gone, but I can’t confirm will lead to other issues.

So could you please double check it and give your comments?

My change is as the below:

gf_boolean_t
gf_is_zero_filled_stat (struct iatt *buf)
{
        if (!buf)
                return 1;

        /* Do not use st_dev because it is transformed to store the xlator id
         * in place of the device number. Do not use st_ino because by this time
         * we've already mapped the root ino to 1 so it is not guaranteed to be
         * 0.
         */
//        if ((buf->ia_nlink == 0) && (buf->ia_ctime == 0))
        if (buf->ia_ctime == )
                return 1;

        return 0;
}

void
gf_zero_fill_stat (struct iatt *buf)
{
//       buf->ia_nlink = 0;
        buf->ia_ctime = 0;
}

Thanks & Best Regards
George
From: Lian, George (NSB - CN/Hangzhou)
Sent: Friday, January 19, 2018 10:03 AM
To: Pranith Kumar Karampuri <[email protected]<mailto:[email protected]>>; 
Zhou, Cynthia (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Cc: Li, Deqian (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Sun, Ping (NSB - 
CN/Hangzhou) <[email protected]<mailto:[email protected]>>

Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

Hi,
>>> Cool, this works for me too. Send me a mail off-list once you are available 
>>> and we can figure out a way to get into a call and work on this.

Have you reproduced the issue per the step I listed in 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457 and last mail?

If not, I would like you could try it yourself , which the difference between 
yours and mine is just create only 2 bricks instead of 6 bricks.

And Cynthia could have a session with you if you needed when I am not available 
in next Monday and Tuesday.

Thanks & Best Regards,
George

From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 6:03 PM
To: Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; Li, Deqian 
(NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Sun, Ping (NSB - 
CN/Hangzhou) <[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 18, 2018 at 12:17 PM, Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
>>>I actually tried it with replica-2 and replica-3 and then distributed 
>>>replica-2 before replying to the earlier mail. We can have a debugging 
>>>session if you are okay with it.

It is fine if you can’t reproduce the issue in your ENV.
And I has attached the detail reproduce log in the Bugzilla FYI

But I am sorry I maybe OOO at Monday and Tuesday next week, so debug session 
will be fine to me at next Wednesday.

Cool, this works for me too. Send me a mail off-list once you are available and 
we can figure out a way to get into a call and work on this.



Paste the detail reproduce log FYI here:
root@ubuntu:~# gluster peer probe ubuntu
peer probe: success. Probe on localhost not needed
root@ubuntu:~# gluster v create test replica 2 ubuntu:/home/gfs/b1 
ubuntu:/home/gfs/b2 force
volume create: test: success: please start the volume to access data
root@ubuntu:~# gluster v start test
volume start: test: success
root@ubuntu:~# gluster v info test

Volume Name: test
Type: Replicate
Volume ID: fef5fca3-81d9-46d3-8847-74cde6f701a5
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: ubuntu:/home/gfs/b1
Brick2: ubuntu:/home/gfs/b2
Options Reconfigured:
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
root@ubuntu:~# gluster v status
Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ubuntu:/home/gfs/b1                   49152     0          Y       7798
Brick ubuntu:/home/gfs/b2                   49153     0          Y       7818
Self-heal Daemon on localhost               N/A       N/A        Y       7839

Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks


root@ubuntu:~# gluster v set test cluster.consistent-metadata on
volume set: success

root@ubuntu:~# ls /mnt/test
ls: cannot access '/mnt/test': No such file or directory
root@ubuntu:~# mkdir -p /mnt/test
root@ubuntu:~# mount -t glusterfs ubuntu:/test /mnt/test

root@ubuntu:~# cd /mnt/test
root@ubuntu:/mnt/test# echo "abc">aaa
root@ubuntu:/mnt/test# cp aaa bbb;link bbb ccc

root@ubuntu:/mnt/test# kill -9 7818
root@ubuntu:/mnt/test# cp aaa ddd;link ddd eee
link: cannot create link 'eee' to 'ddd': No such file or directory


Best Regards,
George

From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 18, 2018 2:40 PM

To: Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Li, Deqian (NSB - 
CN/Hangzhou) <[email protected]<mailto:[email protected]>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 18, 2018 at 6:33 AM, Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I suppose the brick numbers in your testing is six, and you just shut down the 
3 process.
When I reproduce the issue, I only create a replicate volume with 2 bricks, 
only let ONE brick working and set cluster.consistent-metadata on,
With this 2 test condition, the issue could 100% reproducible.

Hi,
      I actually tried it with replica-2 and replica-3 and then distributed 
replica-2 before replying to the earlier mail. We can have a debugging session 
if you are okay with it.
I am in the middle of a customer issue myself(That is the reason for this delay 
:-( ) and thinking of wrapping it up early next week. Would that be fine with 
you?




16:44:28 :) ⚡ gluster v status
Status of volume: r2
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick localhost.localdomain:/home/gfs/r2_0  49152     0          Y       5309
Brick localhost.localdomain:/home/gfs/r2_1  49154     0          Y       5330
Brick localhost.localdomain:/home/gfs/r2_2  49156     0          Y       5351
Brick localhost.localdomain:/home/gfs/r2_3  49158     0          Y       5372
Brick localhost.localdomain:/home/gfs/r2_4  49159     0          Y       5393
Brick localhost.localdomain:/home/gfs/r2_5  49160     0          Y       5414
Self-heal Daemon on localhost               N/A       N/A        Y       5436

Task Status of Volume r2
------------------------------------------------------------------------------
There are no active volume tasks

root@dhcp35-190 - ~
16:44:38 :) ⚡ kill -9 5309 5351 5393

Best Regards,
George
From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 17, 2018 7:27 PM
To: Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Cc: Li, Deqian (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Zhou, Cynthia (NSB 
- CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; Sun, Ping 
(NSB - CN/Hangzhou) <[email protected]<mailto:[email protected]>>

Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Mon, Jan 15, 2018 at 1:55 PM, Pranith Kumar Karampuri 
<[email protected]<mailto:[email protected]>> wrote:


On Mon, Jan 15, 2018 at 8:46 AM, Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

Have you reproduced this issue? If yes, could you please confirm whether it is 
an issue or not?

Hi,
       I tried recreating this on my laptop and on both master and 3.12 and I 
am not able to recreate the issue :-(.
Here is the execution log: 
https://paste.fedoraproject.org/paste/-csXUKrwsbrZAVW1KzggQQ
Since I was doing this on my laptop, I changed shutting down of the replica to 
killing the brick process to simulate this test.
Let me know if I missed something.


Sorry, I am held up with some issue at work, so I think I will get some time 
day after tomorrow to look at this. In the mean time I am adding more people 
who know about afr to see if they get a chance to work on this before me.


And if it is an issue,  do you have any solution for this issue?

Thanks & Best Regards,
George

From: Lian, George (NSB - CN/Hangzhou)
Sent: Thursday, January 11, 2018 2:01 PM
To: Pranith Kumar Karampuri <[email protected]<mailto:[email protected]>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Li, Deqian (NSB - 
CN/Hangzhou) <[email protected]<mailto:[email protected]>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Subject: RE: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"

Hi,

Please see detail test step on 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457

How reproducible:


Steps to Reproduce:
1.create a volume name "test" with replicated
2.set volume option cluster.consistent-metadata with on:
  gluster v set test cluster.consistent-metadata on
3. mount volume test on client on /mnt/test
4. create a file aaa size more than 1 byte
   echo "1234567890" >/mnt/test/aaa
5. shutdown a replicat node, let's say sn-1, only let sn-0 worked
6. cp /mnt/test/aaa /mnt/test/bbb; link /mnt/test/bbb /mnt/test/ccc


BRs
George

From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Pranith Kumar Karampuri
Sent: Thursday, January 11, 2018 12:39 PM
To: Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Li, Deqian (NSB - 
CN/Hangzhou) <[email protected]<mailto:[email protected]>>; 
Sun, Ping (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Thu, Jan 11, 2018 at 6:35 AM, Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
>>> In which protocol are you seeing this issue? Fuse/NFS/SMB?
It is fuse, within mountpoint by “mount -t glusterfs  …“ command.

Could you let me know the test you did so that I can try to re-create and see 
what exactly is going on?
Configuration of the volume and the steps to re-create the issue you are seeing 
would be helpful in debugging the issue further.


Thanks & Best Regards,
George

From: 
[email protected]<mailto:[email protected]> 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Pranith Kumar Karampuri
Sent: Wednesday, January 10, 2018 8:08 PM
To: Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>
Cc: Zhou, Cynthia (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; Zhong, Hua 
(NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>>; Li, Deqian (NSB 
- CN/Hangzhou) <[email protected]<mailto:[email protected]>>; 
[email protected]<mailto:[email protected]>; Sun, Ping (NSB - 
CN/Hangzhou) <[email protected]<mailto:[email protected]>>
Subject: Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't 
let NFS cache stat after writes"



On Wed, Jan 10, 2018 at 11:09 AM, Lian, George (NSB - CN/Hangzhou) 
<[email protected]<mailto:[email protected]>> wrote:
Hi, Pranith Kumar,

I has create a bug on Bugzilla 
https://bugzilla.redhat.com/show_bug.cgi?id=1531457
After my investigation for this link issue, I suppose your changes on 
afr-dir-write.c with issue " Don't let NFS cache stat after writes" , your fix 
is like:
--------------------------------------
       if (afr_txn_nothing_failed (frame, this)) {
                        /*if it did pre-op, it will do post-op changing ctime*/
                        if (priv->consistent_metadata &&
                            afr_needs_changelog_update (local))
                                afr_zero_fill_stat (local);
                        local->transaction.unwind (frame, this);
                }
In the above fix, it set the ia_nlink to ‘0’ if option consistent-metadata is 
set to “on”.
And hard link a file with which just created will lead to an error, and the 
error is caused in kernel function “vfs_link”:
if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
             error =  -ENOENT;

could you please have a check and give some comments here?

When stat is "zero filled", understanding is that the higher layer protocol 
doesn't send stat value to the kernel and a separate lookup is sent by the 
kernel to get the latest stat value. In which protocol are you seeing this 
issue? Fuse/NFS/SMB?


Thanks & Best Regards,
George



--
Pranith



--
Pranith



--
Pranith



--
Pranith



--
Pranith



--
Pranith

_______________________________________________
Gluster-devel mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] a link issue maybe introduced in a bug fix " Don't let NFS cache stat after writes"

Reply via email to