On 03/23/2015 11:28 AM, Jonathan Heese wrote: > On Mar 23, 2015, at 1:20 AM, "Mohammed Rafi K C" <rkavu...@redhat.com > <mailto:rkavu...@redhat.com>> wrote: > >> >> On 03/21/2015 07:49 PM, Jonathan Heese wrote: >>> >>> Mohamed, >>> >>> >>> I have completed the steps you suggested (unmount all, stop the >>> volume, set the config.transport to tcp, start the volume, mount, >>> etc.), and the behavior has indeed changed. >>> >>> >>> [root@duke ~]# gluster volume info >>> >>> Volume Name: gluster_disk >>> Type: Replicate >>> Volume ID: 2307a5a8-641e-44f4-8eaf-7cc2b704aafd >>> Status: Started >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: duke-ib:/bricks/brick1 >>> Brick2: duchess-ib:/bricks/brick1 >>> Options Reconfigured: >>> config.transport: tcp >>> >>> >>> [root@duke ~]# gluster volume status >>> Status of volume: gluster_disk >>> Gluster process Port >>> Online Pid >>> ------------------------------------------------------------------------------ >>> Brick duke-ib:/bricks/brick1 49152 >>> Y 16362 >>> Brick duchess-ib:/bricks/brick1 49152 >>> Y 14155 >>> NFS Server on localhost 2049 >>> Y 16374 >>> Self-heal Daemon on localhost N/A >>> Y 16381 >>> NFS Server on duchess-ib 2049 >>> Y 14167 >>> Self-heal Daemon on duchess-ib N/A >>> Y 14174 >>> >>> Task Status of Volume gluster_disk >>> ------------------------------------------------------------------------------ >>> There are no active volume tasks >>> >>> I am no longer seeing the I/O errors during prolonged periods of >>> write I/O that I was seeing when the transport was set to rdma. >>> However, I am seeing this message on both nodes every 3 seconds >>> (almost exactly): >>> >>> >>> ==> /var/log/glusterfs/nfs.log <== >>> [2015-03-21 14:17:40.379719] W >>> [rdma.c:1076:gf_rdma_cm_event_handler] 0-gluster_disk-client-1: cma >>> event RDMA_CM_EVENT_REJECTED, error 8 (me:10.10.10.1:1023 >>> peer:10.10.10.2:49152) >>> >>> >>> Is this something to worry about? >>> >> If you are not using nfs to export the volumes, there is nothing to >> worry. > > I'm using the native glusterfs FUSE component to mount the volume > locally on both servers -- I assume that you're referring to the > standard NFS protocol stuff, which I'm not using here. > > Incidentally, I would like to keep my logs from filling up with junk > if possible. Is there something I can do to get rid of these > (useless?) error messages?
If i understand correctly, you are getting this enormous log message from nfs log only, all other logs and everything are fine now, right ? If that is the case, and you are not at all using nfs for exporting the volume, as a workaround you can disable nfs for your volume or cluster. (gluster v set nfs.disable on). This will turnoff your gluster nfs server, and you will no longer get those log messages. >>> Any idea why there are rdma pieces in play when I've set my >>> transport to tcp? >>> >> >> there should not be any piece of rdma,if possible, can you paste the >> volfile for nfs server. You can find the volfile in >> /var/lib/glusterd/nfs/nfs-server.vol or >> /usr/local/var/lib/glusterd/nfs/nfs-server.vol > > I will get this for you when I can. Thanks. If you can make it, that will be great help to understand the problem. Rafi KC > > Regards, > Jon Heese > >> Rafi KC >>> >>> The actual I/O appears to be handled properly and I've seen no >>> further errors in the testing I've done so far. >>> >>> >>> Thanks. >>> >>> >>> Regards, >>> >>> Jon Heese >>> >>> >>> ------------------------------------------------------------------------ >>> *From:* gluster-users-boun...@gluster.org >>> <gluster-users-boun...@gluster.org> on behalf of Jonathan Heese >>> <jhe...@inetu.net> >>> *Sent:* Friday, March 20, 2015 7:04 AM >>> *To:* Mohammed Rafi K C >>> *Cc:* gluster-users >>> *Subject:* Re: [Gluster-users] I/O error on replicated volume >>> >>> Mohammed, >>> >>> Thanks very much for the reply. I will try that and report back. >>> >>> Regards, >>> Jon Heese >>> >>> On Mar 20, 2015, at 3:26 AM, "Mohammed Rafi K C" >>> <rkavu...@redhat.com <mailto:rkavu...@redhat.com>> wrote: >>> >>>> >>>> On 03/19/2015 10:16 PM, Jonathan Heese wrote: >>>>> >>>>> Hello all, >>>>> >>>>> >>>>> >>>>> Does anyone else have any further suggestions for troubleshooting >>>>> this? >>>>> >>>>> >>>>> >>>>> To sum up: I have a 2 node 2 brick replicated volume, which holds >>>>> a handful of iSCSI image files which are mounted and served up by >>>>> tgtd (CentOS 6) to a handful of devices on a dedicated iSCSI >>>>> network. The most important iSCSI clients (initiators) are four >>>>> VMware ESXi 5.5 hosts that use the iSCSI volumes as backing for >>>>> their datastores for virtual machine storage. >>>>> >>>>> >>>>> >>>>> After a few minutes of sustained writing to the volume, I am >>>>> seeing a massive flood (over 1500 per second at times) of this >>>>> error in /var/log/glusterfs/mnt-gluster-disk.log: >>>>> >>>>> [2015-03-16 02:24:07.582801] W >>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635358: >>>>> WRITE => -1 (Input/output error) >>>>> >>>>> >>>>> >>>>> When this happens, the ESXi box fails its write operation and >>>>> returns an error to the effect of “Unable to write data to >>>>> datastore”. I don’t see anything else in the supporting logs to >>>>> explain the root cause of the i/o errors. >>>>> >>>>> >>>>> >>>>> Any and all suggestions are appreciated. Thanks. >>>>> >>>>> >>>>> >>>> >>>> From the mount logs, i assume that your volume transport type is >>>> rdma. There are some known issues for rdma in 3.5.3, and the patch >>>> for to address those issues are already send to upstream [1]. From >>>> the logs, I'm not sure and it is hard to tell you whether this >>>> problem is something related to rdma transport or not. To make sure >>>> that the tcp transport is works well in this scenario, if possible >>>> can you try to reproduce the same using tcp type volumes. You can >>>> change the transport type of volume by doing the following step ( >>>> not recommended in normal use case). >>>> >>>> 1) unmount every client >>>> 2) stop the volume >>>> 3) run gluster volume set volname config.transport tcp >>>> 4) start the volume again >>>> 5) mount the clients >>>> >>>> [1] : http://goo.gl/2PTL61 >>>> >>>> Regards >>>> Rafi KC >>>> >>>>> /Jon Heese/ >>>>> /Systems Engineer/ >>>>> *INetU Managed Hosting* >>>>> P: 610.266.7441 x 261 >>>>> F: 610.266.7434 >>>>> www.inetu.net <https://www.inetu.net/> >>>>> >>>>> /** This message contains confidential information, which also may >>>>> be privileged, and is intended only for the person(s) addressed >>>>> above. Any unauthorized use, distribution, copying or disclosure >>>>> of confidential and/or privileged information is strictly >>>>> prohibited. If you have received this communication in error, >>>>> please erase all copies of the message and its attachments and >>>>> notify the sender immediately via reply e-mail. **/ >>>>> >>>>> >>>>> >>>>> *From:*Jonathan Heese >>>>> *Sent:* Tuesday, March 17, 2015 12:36 PM >>>>> *To:* 'Ravishankar N'; gluster-users@gluster.org >>>>> *Subject:* RE: [Gluster-users] I/O error on replicated volume >>>>> >>>>> >>>>> >>>>> Ravi, >>>>> >>>>> >>>>> >>>>> The last lines in the mount log before the massive vomit of I/O >>>>> errors are from 22 minutes prior, and seem innocuous to me: >>>>> >>>>> >>>>> >>>>> [2015-03-16 01:37:07.126340] E >>>>> [client-handshake.c:1760:client_query_portmap_cbk] >>>>> 0-gluster_disk-client-0: failed to get the port number for remote >>>>> subvolume. Please run 'gluster volume status' on server to see if >>>>> brick process is running. >>>>> >>>>> [2015-03-16 01:37:07.126587] W [rdma.c:4273:gf_rdma_disconnect] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) >>>>> [0x7fd9c557bccf] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) >>>>> [0x7fd9c557a995] >>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea) >>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called >>>>> (peer:10.10.10.1:24008) >>>>> >>>>> [2015-03-16 01:37:07.126687] E >>>>> [client-handshake.c:1760:client_query_portmap_cbk] >>>>> 0-gluster_disk-client-1: failed to get the port number for remote >>>>> subvolume. Please run 'gluster volume status' on server to see if >>>>> brick process is running. >>>>> >>>>> [2015-03-16 01:37:07.126737] W [rdma.c:4273:gf_rdma_disconnect] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) >>>>> [0x7fd9c557bccf] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) >>>>> [0x7fd9c557a995] >>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea) >>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called >>>>> (peer:10.10.10.2:24008) >>>>> >>>>> [2015-03-16 01:37:10.730165] I [rpc-clnt.c:1729:rpc_clnt_reconfig] >>>>> 0-gluster_disk-client-0: changing port to 49152 (from 0) >>>>> >>>>> [2015-03-16 01:37:10.730276] W [rdma.c:4273:gf_rdma_disconnect] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) >>>>> [0x7fd9c557bccf] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) >>>>> [0x7fd9c557a995] >>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea) >>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-0: disconnect called >>>>> (peer:10.10.10.1:24008) >>>>> >>>>> [2015-03-16 01:37:10.739500] I [rpc-clnt.c:1729:rpc_clnt_reconfig] >>>>> 0-gluster_disk-client-1: changing port to 49152 (from 0) >>>>> >>>>> [2015-03-16 01:37:10.739560] W [rdma.c:4273:gf_rdma_disconnect] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x13f) >>>>> [0x7fd9c557bccf] >>>>> (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5) >>>>> [0x7fd9c557a995] >>>>> (-->/usr/lib64/glusterfs/3.5.3/xlator/protocol/client.so(client_query_portmap_cbk+0x1ea) >>>>> [0x7fd9c0d8fb9a]))) 0-gluster_disk-client-1: disconnect called >>>>> (peer:10.10.10.2:24008) >>>>> >>>>> [2015-03-16 01:37:10.741883] I >>>>> [client-handshake.c:1677:select_server_supported_programs] >>>>> 0-gluster_disk-client-0: Using Program GlusterFS 3.3, Num >>>>> (1298437), Version (330) >>>>> >>>>> [2015-03-16 01:37:10.744524] I >>>>> [client-handshake.c:1462:client_setvolume_cbk] >>>>> 0-gluster_disk-client-0: Connected to 10.10.10.1:49152, attached >>>>> to remote volume '/bricks/brick1'. >>>>> >>>>> [2015-03-16 01:37:10.744537] I >>>>> [client-handshake.c:1474:client_setvolume_cbk] >>>>> 0-gluster_disk-client-0: Server and Client lk-version numbers are >>>>> not same, reopening the fds >>>>> >>>>> [2015-03-16 01:37:10.744566] I [afr-common.c:4267:afr_notify] >>>>> 0-gluster_disk-replicate-0: Subvolume 'gluster_disk-client-0' came >>>>> back up; going online. >>>>> >>>>> [2015-03-16 01:37:10.744627] I >>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>>>> 0-gluster_disk-client-0: Server lk version = 1 >>>>> >>>>> [2015-03-16 01:37:10.753037] I >>>>> [client-handshake.c:1677:select_server_supported_programs] >>>>> 0-gluster_disk-client-1: Using Program GlusterFS 3.3, Num >>>>> (1298437), Version (330) >>>>> >>>>> [2015-03-16 01:37:10.755657] I >>>>> [client-handshake.c:1462:client_setvolume_cbk] >>>>> 0-gluster_disk-client-1: Connected to 10.10.10.2:49152, attached >>>>> to remote volume '/bricks/brick1'. >>>>> >>>>> [2015-03-16 01:37:10.755676] I >>>>> [client-handshake.c:1474:client_setvolume_cbk] >>>>> 0-gluster_disk-client-1: Server and Client lk-version numbers are >>>>> not same, reopening the fds >>>>> >>>>> [2015-03-16 01:37:10.761945] I >>>>> [fuse-bridge.c:5016:fuse_graph_setup] 0-fuse: switched to graph 0 >>>>> >>>>> [2015-03-16 01:37:10.762144] I >>>>> [client-handshake.c:450:client_set_lk_version_cbk] >>>>> 0-gluster_disk-client-1: Server lk version = 1 >>>>> >>>>> [*2015-03-16 01:37:10.762279*] I [fuse-bridge.c:3953:fuse_init] >>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs >>>>> 7.22 kernel 7.14 >>>>> >>>>> [*2015-03-16 01:59:26.098670*] W >>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 292084: >>>>> WRITE => -1 (Input/output error) >>>>> >>>>> … >>>>> >>>>> >>>>> >>>>> I’ve seen no indication of split-brain on any files at any point >>>>> in this (ever since downdating from 3.6.2 to 3.5.3, which is when >>>>> this particular issue started): >>>>> >>>>> [root@duke gfapi-module-for-linux-target-driver-]# gluster v heal >>>>> gluster_disk info >>>>> >>>>> Brick duke.jonheese.local:/bricks/brick1/ >>>>> >>>>> Number of entries: 0 >>>>> >>>>> >>>>> >>>>> Brick duchess.jonheese.local:/bricks/brick1/ >>>>> >>>>> Number of entries: 0 >>>>> >>>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> /Jon Heese/ >>>>> /Systems Engineer/ >>>>> *INetU Managed Hosting* >>>>> P: 610.266.7441 x 261 >>>>> F: 610.266.7434 >>>>> www.inetu.net <https://www.inetu.net/> >>>>> >>>>> /** This message contains confidential information, which also may >>>>> be privileged, and is intended only for the person(s) addressed >>>>> above. Any unauthorized use, distribution, copying or disclosure >>>>> of confidential and/or privileged information is strictly >>>>> prohibited. If you have received this communication in error, >>>>> please erase all copies of the message and its attachments and >>>>> notify the sender immediately via reply e-mail. **/ >>>>> >>>>> >>>>> >>>>> *From:*Ravishankar N [mailto:ravishan...@redhat.com] >>>>> *Sent:* Tuesday, March 17, 2015 12:35 AM >>>>> *To:* Jonathan Heese; gluster-users@gluster.org >>>>> <mailto:gluster-users@gluster.org> >>>>> *Subject:* Re: [Gluster-users] I/O error on replicated volume >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 03/17/2015 02:14 AM, Jonathan Heese wrote: >>>>> >>>>> Hello, >>>>> >>>>> So I resolved my previous issue with split-brains and the lack >>>>> of self-healing by dropping my installed glusterfs* packages >>>>> from 3.6.2 to 3.5.3, but now I've picked up a new issue, which >>>>> actually makes normal use of the volume practically impossible. >>>>> >>>>> A little background for those not already paying close attention: >>>>> I have a 2 node 2 brick replicating volume whose purpose in >>>>> life is to hold iSCSI target files, primarily for use to >>>>> provide datastores to a VMware ESXi cluster. The plan is to >>>>> put a handful of image files on the Gluster volume, mount them >>>>> locally on both Gluster nodes, and run tgtd on both, pointed >>>>> to the image files on the mounted gluster volume. Then the >>>>> ESXi boxes will use multipath (active/passive) iSCSI to >>>>> connect to the nodes, with automatic failover in case of >>>>> planned or unplanned downtime of the Gluster nodes. >>>>> >>>>> In my most recent round of testing with 3.5.3, I'm seeing a >>>>> massive failure to write data to the volume after about 5-10 >>>>> minutes, so I've simplified the scenario a bit (to minimize >>>>> the variables) to: both Gluster nodes up, only one node (duke) >>>>> mounted and running tgtd, and just regular (single path) iSCSI >>>>> from a single ESXi server. >>>>> >>>>> About 5-10 minutes into migration a VM onto the test >>>>> datastore, /var/log/messages on duke gets blasted with a ton >>>>> of messages exactly like this: >>>>> >>>>> Mar 15 22:24:06 duke tgtd: bs_rdwr_request(180) io error >>>>> 0x1781e00 2a -1 512 22971904, Input/output error >>>>> >>>>> >>>>> >>>>> And /var/log/glusterfs/mnt-gluster_disk.log gets blased with a >>>>> ton of messages exactly like this: >>>>> >>>>> [2015-03-16 02:24:07.572279] W >>>>> [fuse-bridge.c:2242:fuse_writev_cbk] 0-glusterfs-fuse: 635299: >>>>> WRITE => -1 (Input/output error) >>>>> >>>>> >>>>> >>>>> >>>>> Are there any messages in the mount log from AFR about split-brain >>>>> just before the above line appears? >>>>> Does `gluster v heal <VOLNAME> info` show any files? Performing >>>>> I/O on files that are in split-brain fail with EIO. >>>>> >>>>> -Ravi >>>>> >>>>> And the write operation from VMware's side fails as soon as >>>>> these messages start. >>>>> >>>>> >>>>> >>>>> I don't see any other errors (in the log files I know of) >>>>> indicating the root cause of these i/o errors. I'm sure that >>>>> this is not enough information to tell what's going on, but >>>>> can anyone help me figure out what to look at next to figure >>>>> this out? >>>>> >>>>> >>>>> >>>>> I've also considered using Dan Lambright's libgfapi gluster >>>>> module for tgtd (or something similar) to avoid going through >>>>> FUSE, but I'm not sure whether that would be irrelevant to >>>>> this problem, since I'm not 100% sure if it lies in FUSE or >>>>> elsewhere. >>>>> >>>>> >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> >>>>> /Jon Heese/ >>>>> /Systems Engineer/ >>>>> *INetU Managed Hosting* >>>>> P: 610.266.7441 x 261 >>>>> F: 610.266.7434 >>>>> www.inetu.net <https://www.inetu.net/> >>>>> >>>>> /** This message contains confidential information, which also >>>>> may be privileged, and is intended only for the person(s) >>>>> addressed above. Any unauthorized use, distribution, copying >>>>> or disclosure of confidential and/or privileged information is >>>>> strictly prohibited. If you have received this communication >>>>> in error, please erase all copies of the message and its >>>>> attachments and notify the sender immediately via reply >>>>> e-mail. **/ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> >>>>> Gluster-users mailing list >>>>> >>>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> >>>>> >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users@gluster.org >>>>> http://www.gluster.org/mailman/listinfo/gluster-users >>>> >>
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users