Do you observe any event pattern (self-healing / disk failures / reboots etc.) after which the extended attributes are missing?
Regards, Vijay On Fri, Jul 7, 2017 at 5:28 PM, Ankireddypalle Reddy <are...@commvault.com> wrote: > We lost the attributes on all the bricks on servers glusterfs2 and > glusterfs3 again. > > > > [root@glusterfs2 Log_Files]# gluster volume info > > > > Volume Name: StoragePool > > Type: Distributed-Disperse > > Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f > > Status: Started > > Number of Bricks: 20 x (2 + 1) = 60 > > Transport-type: tcp > > Bricks: > > Brick1: glusterfs1sds:/ws/disk1/ws_brick > > Brick2: glusterfs2sds:/ws/disk1/ws_brick > > Brick3: glusterfs3sds:/ws/disk1/ws_brick > > Brick4: glusterfs1sds:/ws/disk2/ws_brick > > Brick5: glusterfs2sds:/ws/disk2/ws_brick > > Brick6: glusterfs3sds:/ws/disk2/ws_brick > > Brick7: glusterfs1sds:/ws/disk3/ws_brick > > Brick8: glusterfs2sds:/ws/disk3/ws_brick > > Brick9: glusterfs3sds:/ws/disk3/ws_brick > > Brick10: glusterfs1sds:/ws/disk4/ws_brick > > Brick11: glusterfs2sds:/ws/disk4/ws_brick > > Brick12: glusterfs3sds:/ws/disk4/ws_brick > > Brick13: glusterfs1sds:/ws/disk5/ws_brick > > Brick14: glusterfs2sds:/ws/disk5/ws_brick > > Brick15: glusterfs3sds:/ws/disk5/ws_brick > > Brick16: glusterfs1sds:/ws/disk6/ws_brick > > Brick17: glusterfs2sds:/ws/disk6/ws_brick > > Brick18: glusterfs3sds:/ws/disk6/ws_brick > > Brick19: glusterfs1sds:/ws/disk7/ws_brick > > Brick20: glusterfs2sds:/ws/disk7/ws_brick > > Brick21: glusterfs3sds:/ws/disk7/ws_brick > > Brick22: glusterfs1sds:/ws/disk8/ws_brick > > Brick23: glusterfs2sds:/ws/disk8/ws_brick > > Brick24: glusterfs3sds:/ws/disk8/ws_brick > > Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick > > Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick > > Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick > > Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick > > Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick > > Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick > > Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick > > Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick > > Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick > > Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick > > Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick > > Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick > > Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick > > Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick > > Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick > > Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick > > Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick > > Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick > > Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick > > Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick > > Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick > > Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick > > Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick > > Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick > > Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick > > Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick > > Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick > > Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick > > Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick > > Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick > > Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick > > Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick > > Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick > > Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick > > Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick > > Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick > > Options Reconfigured: > > performance.readdir-ahead: on > > diagnostics.client-log-level: INFO > > auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds. > commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com] > *Sent:* Friday, July 07, 2017 12:15 PM > > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > > > > > On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy <are...@commvault.com> > wrote: > > 3.7.19 > > > > These are the only callers for removexattr and only _posix_remove_xattr > has the potential to do removexattr as posix_removexattr already makes sure > that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr > happens only from healing code of afr/ec. And this can only happen if the > source brick doesn't have gfid, which doesn't seem to match with the > situation you explained. > > # line filename / context / line > 1 1234 xlators/mgmt/glusterd/src/glusterd-quota.c > <<glusterd_remove_quota_limit>> > ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY); > 2 1243 xlators/mgmt/glusterd/src/glusterd-quota.c > <<glusterd_remove_quota_limit>> > ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY); > 3 6102 xlators/mgmt/glusterd/src/glusterd-utils.c > <<glusterd_check_and_set_brick_xattr>> > sys_lremovexattr (path, "trusted.glusterfs.test"); > 4 80 xlators/storage/posix/src/posix-handle.h > <<REMOVE_PGFID_XATTR>> > op_ret = sys_lremovexattr (path, key); \ > 5 5026 xlators/storage/posix/src/posix.c <<_posix_remove_xattr>> > op_ret = sys_lremovexattr (filler->real_path, key); > 6 5101 xlators/storage/posix/src/posix.c <<posix_removexattr>> > op_ret = sys_lremovexattr (real_path, name); > 7 6811 xlators/storage/posix/src/posix.c <<init>> > sys_lremovexattr (dir_data->data, "trusted.glusterfs.test"); > > So there are only two possibilities: > > 1) Source directory in ec/afr doesn't have gfid > > 2) Something else removed these xattrs. > > What is your volume info? May be that will give more clues. > > > > PS: sys_fremovexattr is called only from posix_fremovexattr(), so that > doesn't seem to be the culprit as it also have checks to guard against > gfid/volume-id removal. > > > > Thanks and Regards, > > Ram > > *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com] > *Sent:* Friday, July 07, 2017 11:54 AM > > > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > > > > > On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy <are...@commvault.com> > wrote: > > Pranith, > > Thanks for looking in to the issue. The bricks were > mounted after the reboot. One more thing that I noticed was when the > attributes were manually set when glusterd was up then on starting the > volume the attributes were again lost. Had to stop glusterd set attributes > and then start glusterd. After that the volume start succeeded. > > > > Which version is this? > > > > > > Thanks and Regards, > > Ram > > > > *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com] > *Sent:* Friday, July 07, 2017 11:46 AM > *To:* Ankireddypalle Reddy > *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org > *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost > > > > Did anything special happen on these two bricks? It can't happen in the > I/O path: > posix_removexattr() has: > 0 if (!strcmp (GFID_XATTR_KEY, name)) > { > > > 1 gf_msg (this->name, GF_LOG_WARNING, 0, > P_MSG_XATTR_NOT_REMOVED, > 2 "Remove xattr called on gfid for file %s", > real_path); > 3 op_ret = -1; > > 4 goto out; > > 5 } > > 6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) > { > 7 gf_msg (this->name, GF_LOG_WARNING, 0, > P_MSG_XATTR_NOT_REMOVED, > 8 "Remove xattr called on volume-id for file > %s", > 9 real_path); > > 10 op_ret = -1; > > 11 goto out; > > 12 } > > I just found that op_errno is not set correctly, but it can't happen in > the I/O path, so self-heal/rebalance are off the hook. > > I also grepped for any removexattr of trusted.gfid from glusterd and > didn't find any. > > So one thing that used to happen was that sometimes when machines reboot, > the brick mounts wouldn't happen and this would lead to absence of both > trusted.gfid and volume-id. So at the moment this is my wild guess. > > > > > > On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy <are...@commvault.com> > wrote: > > Hi, > > We faced an issue in the production today. We had to stop the > volume and reboot all the servers in the cluster. Once the servers > rebooted starting of the volume failed because the following extended > attributes were not present on all the bricks on 2 servers. > > 1) trusted.gfid > > 2) trusted.glusterfs.volume-id > > > > We had to manually set these extended attributes to start the volume. Are > there any such known issues. > > > > Thanks and Regards, > > Ram > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-devel > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > > -- > > Pranith > > ***************************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged material for > the > > sole use of the intended recipient. Any unauthorized review, use or > distribution > > by others is strictly prohibited. If you have received the message by > mistake, > > please advise the sender by reply email and delete the message. Thank you." > > ********************************************************************** > > > > > -- > > Pranith > ***************************Legal Disclaimer*************************** > "This communication may contain confidential and privileged material for > the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message by > mistake, > please advise the sender by reply email and delete the message. Thank you." > ********************************************************************** > > _______________________________________________ > Gluster-users mailing list > gluster-us...@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-devel