Vijay, Yes, I find it's a problem of xfs later. After upgrading xfs code, I've not seen this problem again.
Thanks a lot! Paul On Fri, Nov 17, 2017 at 12:08 AM, Vijay Bellur <[email protected]> wrote: > > > On Thu, Nov 16, 2017 at 6:23 AM, Paul <[email protected]> wrote: > >> Hi, >> >> I have a 5-nodes GlusterFS cluster with Distributed-Replicate. There are >> 180 bricks in total. The OS is CentOS6.5, and GlusterFS is 3.11.0. I find >> many bricks are offline when we generate some empty files and rename them. >> I see xfs call trace in every node. >> >> For example, >> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): Internal error >> xfs_trans_cancel at line 1948 of file fs/xfs/xfs_trans.c. Caller >> 0xffffffffa04e33f9 >> Nov 16 11:15:12 node10 kernel: >> Nov 16 11:15:12 node10 kernel: Pid: 9939, comm: glusterfsd Tainted: G >> --------------- H 2.6.32-prsys.1.1.0.13.x86_64 #1 >> Nov 16 11:15:12 node10 kernel: Call Trace: >> Nov 16 11:15:12 node10 kernel: [<ffffffffa04c803f>] ? >> xfs_error_report+0x3f/0x50 [xfs] >> Nov 16 11:15:12 node10 kernel: [<ffffffffa04e33f9>] ? >> xfs_rename+0x2c9/0x6c0 [xfs] >> Nov 16 11:15:12 node10 kernel: [<ffffffffa04e5e39>] ? >> xfs_trans_cancel+0xd9/0x100 [xfs] >> Nov 16 11:15:12 node10 kernel: [<ffffffffa04e33f9>] ? >> xfs_rename+0x2c9/0x6c0 [xfs] >> Nov 16 11:15:12 node10 kernel: [<ffffffff811962c5>] ? >> mntput_no_expire+0x25/0xb0 >> Nov 16 11:15:12 node10 kernel: [<ffffffffa04f5a06>] ? >> xfs_vn_rename+0x66/0x70 [xfs] >> Nov 16 11:15:12 node10 kernel: [<ffffffff81184580>] ? >> vfs_rename+0x2a0/0x500 >> Nov 16 11:15:12 node10 kernel: [<ffffffff81182cd6>] ? >> generic_permission+0x16/0xa0 >> Nov 16 11:15:12 node10 kernel: [<ffffffff811882d9>] ? >> sys_renameat+0x369/0x420 >> Nov 16 11:15:12 node10 kernel: [<ffffffff81185f06>] ? >> final_putname+0x26/0x50 >> Nov 16 11:15:12 node10 kernel: [<ffffffff81186189>] ? putname+0x29/0x40 >> Nov 16 11:15:12 node10 kernel: [<ffffffff811861f9>] ? >> user_path_at+0x59/0xa0 >> Nov 16 11:15:12 node10 kernel: [<ffffffff8151dc79>] ? >> unroll_tree_refs+0x16/0xbc >> Nov 16 11:15:12 node10 kernel: [<ffffffff810d1698>] ? >> audit_syscall_entry+0x2d8/0x300 >> Nov 16 11:15:12 node10 kernel: [<ffffffff811883ab>] ? sys_rename+0x1b/0x20 >> Nov 16 11:15:12 node10 kernel: [<ffffffff8100b032>] ? >> system_call_fastpath+0x16/0x1b >> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): >> xfs_do_force_shutdown(0x8) called from line 1949 of file >> fs/xfs/xfs_trans.c. Return address = 0xffffffffa04e5e52 >> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): Corruption of in-memory >> data detected. Shutting down filesystem >> Nov 16 11:15:12 node10 kernel: XFS (rdc00d28p2): Please umount the >> filesystem and rectify the problem(s) >> Nov 16 11:15:30 node10 disks-FAvUzxiL-brick[29742]: [2017-11-16 >> 11:15:30.206208] M [MSGID: 113075] >> [posix-helpers.c:1891:posix_health_check_thread_proc] >> 0-data-posix: health-check failed, going down >> Nov 16 11:15:30 node10 disks-FAvUzxiL-brick[29742]: [2017-11-16 >> 11:15:30.206538] M [MSGID: 113075] >> [posix-helpers.c:1908:posix_health_check_thread_proc] >> 0-data-posix: still alive! -> SIGTERM >> Nov 16 11:15:37 node10 kernel: XFS (sdm): xfs_log_force: error 5 returned. >> Nov 16 11:16:07 node10 kernel: XFS (sdm): xfs_log_force: error 5 returned. >> >> >> > > As the logs indicate, xfs shut down and the posix health check feature in > Gluster rendered the brick offline. You would be better off checking with > the xfs community about this problem. > > Regards, > Vijay >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
