On 03/17/2013 06:55 PM, Marc Seeger wrote:
Hi,
We just ran into drench dying on one of our test runs.
We execute a dbench each on 2 machines.
We use the following parameters: dbench 6 -t 60 -D $DIRECTORY (host specific,
they each write in a separate one)
The directories are on a mountpoint connected using glusterfs 3.3.1
(3.3.1-ubuntu1~lucid8 from
https://launchpad.net/~semiosis/+archive/ubuntu-glusterfs-3.3)
This is how dbench died:
I, [2013-03-16T05:34:03.176890 #13121] INFO -- : [710] rename
/mnt/gfs/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/NEWPCB.PPT
/mnt/gfs/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/PPTB1E4.TMP
failed (No such file or directory) - expected NT_STATUS_OK
These are the logs at the time. They are a bit noisy, the matching message is
emphasised using *****:
[2013-03-16 05:34:03.082813] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f274
[2013-03-16 05:34:03.082813] W [client3_1-fops.c:1595:client3_1_entrylk_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.082813] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/PPTB1E4.TMP
(b49d6051-93f6-4eca-b161-865a5bea964b)
[2013-03-16 05:34:03.082813] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f4cc
[2013-03-16 05:34:03.082813] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f6c0
[2013-03-16 05:34:03.082813] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f468
[2013-03-16 05:34:03.082813] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f33c
[2013-03-16 05:34:03.082813] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.092814] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/ZD16.BMP
(73e3b099-48cd-4e76-8049-c64bf8f63500)
[2013-03-16 05:34:03.092814] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/NEWPCB.PPT
(ba53fb9f-0648-4794-aaa9-bba9331b52cb)
[2013-03-16 05:34:03.092814] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/PCBENCHM.PPT
(a0c96e9a-4d4a-4984-9892-ff0b2ecbb7e3)
[2013-03-16 05:34:03.092814] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client4/~dmtmp/PWRPNT/PPTB1E4.TMP
(2b8f1677-6376-4286-a381-8f4897bc9f4a)
[2013-03-16 05:34:03.092814] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f594
[2013-03-16 05:34:03.092814] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f3a0
[2013-03-16 05:34:03.092814] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f2d8
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client4/~dmtmp/PWRPNT/ZD16.BMP
(eafa5f6a-fe12-4b9c-a5b9-386f2ff2123f)
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client4/~dmtmp/PWRPNT/NEWPCB.PPT
(8c99ede1-3782-49f0-b544-00f4ec3beb9b)
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client4/~dmtmp/PWRPNT/PCBENCHM.PPT
(a725ede8-bc10-42a1-9622-55afad13f9f7)
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.112816] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.132819] W [client3_1-fops.c:1595:client3_1_entrylk_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.132819] W [client3_1-fops.c:1595:client3_1_entrylk_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.132819] W [client3_1-fops.c:1595:client3_1_entrylk_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.132819] W [client3_1-fops.c:1595:client3_1_entrylk_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.142820] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/NEWPCB.PPT
(ba53fb9f-0648-4794-aaa9-bba9331b52cb)
[2013-03-16 05:34:03.142820] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f788
[2013-03-16 05:34:03.142820] W [client3_1-fops.c:418:client3_1_open_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/NEWPCB.PPT
(ba53fb9f-0648-4794-aaa9-bba9331b52cb)
[2013-03-16 05:34:03.142820] W [client3_1-fops.c:881:client3_1_flush_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.142820] W [client3_1-fops.c:2546:client3_1_opendir_cbk]
0-remote9: remote operation failed: No such file or directory. Path:
/something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT
(6512393c-65b8-4d86-ae78-8a12eb2be395)
[2013-03-16 05:34:03.172824] W [client3_1-fops.c:1595:client3_1_entrylk_cbk]
0-remote9: remote operation failed: No such file or directory
[2013-03-16 05:34:03.172824] W [fuse-bridge.c:1516:fuse_rename_cbk] 0-glusterfs-fuse:
11218: /something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/NEWPCB.PPT
-> /something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/PPTB1E4.TMP
=> -1 (No such file or directory)
[2013-03-16 05:34:03.232831] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f404
[2013-03-16 05:34:03.242832] I [afr-open.c:318:afr_openfd_fix_open_cbk]
0-replicate0: fd for
/something.example.com_1363412031/clients/client0/~dmtmp/PWRPNT/PCBENCHM.PPT
opened successfully on subvolume remote9
[2013-03-16 05:34:03.252834] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f5f8
[2013-03-16 05:34:03.262835] I [afr-open.c:318:afr_openfd_fix_open_cbk]
0-replicate0: fd for
/something.example.com_1363412031/clients/client3/~dmtmp/PWRPNT/PCBENCHM.PPT
opened successfully on subvolume remote9
***********************
[2013-03-16 05:34:03.172824] W [fuse-bridge.c:1516:fuse_rename_cbk] 0-glusterfs-fuse:
11218: /something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/NEWPCB.PPT
-> /something.example.com_1363412031/clients/client2/~dmtmp/PWRPNT/PPTB1E4.TMP
=> -1 (No such file or directory)
***********************
[2013-03-16 05:34:03.232831] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f404
[2013-03-16 05:34:03.242832] I [afr-open.c:318:afr_openfd_fix_open_cbk]
0-replicate0: fd for
/something.example.com_1363412031/clients/client0/~dmtmp/PWRPNT/PCBENCHM.PPT
opened successfully on subvolume remote9
[2013-03-16 05:34:03.252834] I [afr-inode-write.c:428:afr_open_fd_fix]
0-replicate0: Opening fd 0x7f1adb67f5f8
[2013-03-16 05:34:03.262835] I [afr-open.c:318:afr_openfd_fix_open_cbk]
0-replicate0: fd for
/something.example.com_1363412031/clients/client3/~dmtmp/PWRPNT/PCBENCHM.PPT
opened successfully on subvolume remote9
[2013-03-16 05:36:21.547011] C
[client-handshake.c:126:rpc_client_ping_timer_expired] 0-remote8: server
10.245.15.65:24007 has not responded in the last 42 seconds, disconnecting.
[2013-03-16 05:36:21.547011] C
[client-handshake.c:126:rpc_client_ping_timer_expired] 0-remote9: server
10.196.239.242:24007 has not responded in the last 42 seconds, disconnecting.
[2013-03-16 05:36:21.547011] E [rpc-clnt.c:373:saved_frames_unwind]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f1adab0a048]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f1adab09d00]
(-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f1adab0976e]))) 0-remote8:
forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2013-03-16
05:35:10.750385 (xid=0x18942x)
[2013-03-16 05:36:21.547011] W [client3_1-fops.c:2630:client3_1_lookup_cbk]
0-remote8: remote operation failed: Transport endpoint is not connected. Path:
/ (00000000-0000-0000-0000-000000000001)
[2013-03-16 05:36:21.547011] E [rpc-clnt.c:373:saved_frames_unwind]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f1adab0a048]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f1adab09d00]
(-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f1adab0976e]))) 0-remote8:
forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2013-03-16
05:35:18.191110 (xid=0x18943x)
[2013-03-16 05:36:21.547011] W [client3_1-fops.c:2630:client3_1_lookup_cbk]
0-remote8: remote operation failed: Transport endpoint is not connected. Path:
/ (00000000-0000-0000-0000-000000000001)
[2013-03-16 05:36:21.547011] E [rpc-clnt.c:373:saved_frames_unwind]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7f1adab0a048]
(-->/usr/lib/libgfrpc.so.0(rpc_clnt_connection_cleanup+0xb0) [0x7f1adab09d00]
(-->/usr/lib/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7f1adab0976e]))) 0-remote8:
forced unwinding frame type(GlusterFS Handshake) op(PING(3)) called at 2013-03-16
05:35:39.543151 (xid=0x18944x)
Anybody have an idea what could cause such errors?
The rpc_client_ping_timer_expired timeouts seem a bit strange. They are after
the fail and we do test networking problems in a previous test, so they might
just have stuck around from then.
Cheers,
Marc
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users
hi,
If obtaining of entry locks fails for any of the bricks in a
replica subvolume, rename used to fail. This bug is fixed in 3.4aplha.
Pranith.
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users