Can you please pass all the gluster log files from the server where the
transport end point not connected error is reported? As restarting glusterd
didn’t solve this issue, I believe this isn’t a stale port problem but
something else. Also please provide the output of ‘gluster v info <volname>’

(@cc Ravi, Karthik)

On Fri, 31 Aug 2018 at 23:24, Johnson, Tim <[email protected]> wrote:

> Hello all,
>
>
>
>       We have a gluster replicate (with arbiter)  volumes that we are
> getting “Transport endpoint is not connected” with on a rotating basis
>  from each of the two file servers, and a third host that has the arbiter
> bricks on.
>
> This is happening when trying to run a heal on all the volumes on the
> gluster hosts   When I get the status of all the volumes all looks good.
>
>        This behavior seems to be a forshadowing of the gluster volumes
> becoming unresponsive to our vm cluster.  As well as one of the file
> servers have two processes for each of the volumes instead of one per
> volume. Eventually the affected file server
>
> will drop off the listed peers. Restarting glusterd/glusterfsd on the
> affected file server does not take care of the issue, we have to bring down
> both file
>
> Servers due to the volumes not being seen by the vm cluster after the
> errors start occurring. I had seen that there were bug reports about the
> “Transport endpoint is not connected” on earlier versions of Gluster
> however had thought that
>
> It had been addressed.
>
>      Dmesg did have some entries for “a possible syn flood on port *”
> which we changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which
> seemed to help the syn flood messages but not the underlying volume issues.
>
>     I have put the versions of all the Gluster packages installed below as
> well as the   “Heal” and “Status” commands showing the volumes are
>
>
>
>        This has just started happening but cannot definitively say if this
> started occurring after an update or not.
>
>
>
>
>
> Thanks for any assistance.
>
>
>
>
>
> Running Heal  :
>
>
>
> gluster volume heal ovirt_engine info
>
> Brick ****1.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Connected
>
> Number of entries: 0
>
>
>
> Brick ****3.rrc.local:/bricks/brick0/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
> Brick *****3.rrc.local:/bricks/arb-brick/ovirt_engine
>
> Status: Transport endpoint is not connected
>
> Number of entries: -
>
>
>
>
>
> Running status :
>
>
>
> gluster volume status ovirt_engine
>
> Status of volume: ovirt_engine
>
> Gluster process                             TCP Port  RDMA Port  Online
> Pid
>
>
> ------------------------------------------------------------------------------
>
> Brick*****.rrc.local:/bricks/brick0/ov
>
> irt_engine                                  49152     0          Y
> 5521
>
> Brick fs2-tier3.rrc.local:/bricks/brick0/ov
>
> irt_engine                                  49152     0          Y
> 6245
>
> Brick ****.rrc.local:/bricks/arb-b
>
> rick/ovirt_engine                           49152     0          Y
> 3526
>
> Self-heal Daemon on localhost               N/A       N/A        Y
> 5509
>
> Self-heal Daemon on ***.rrc.local     N/A       N/A        Y       6218
>
> Self-heal Daemon on ***.rrc.local       N/A       N/A        Y       3501
>
> Self-heal Daemon on ****.rrc.local N/A       N/A        Y       3657
>
> Self-heal Daemon on *****.rrc.local   N/A       N/A        Y       3753
>
> Self-heal Daemon on ****.rrc.local N/A       N/A        Y       17284
>
>
>
> Task Status of Volume ovirt_engine
>
>
> ------------------------------------------------------------------------------
>
> There are no active volume tasks
>
>
>
>
>
>
>
>
>
> /etc/glusterd.vol.   :
>
>
>
>
>
> volume management
>
>     type mgmt/glusterd
>
>     option working-directory /var/lib/glusterd
>
>     option transport-type socket,rdma
>
>     option transport.socket.keepalive-time 10
>
>     option transport.socket.keepalive-interval 2
>
>     option transport.socket.read-fail-log off
>
>     option ping-timeout 0
>
>     option event-threads 1
>
>     option rpc-auth-allow-insecure on
>
> #   option transport.address-family inet6
>
> #   option base-port 49152
>
> end-volume
>
>
>
>
>
>
>
>
>
>
>
> rpm -qa |grep gluster
>
> glusterfs-3.12.13-1.el7.x86_64
>
> glusterfs-gnfs-3.12.13-1.el7.x86_64
>
> glusterfs-api-3.12.13-1.el7.x86_64
>
> glusterfs-cli-3.12.13-1.el7.x86_64
>
> glusterfs-client-xlators-3.12.13-1.el7.x86_64
>
> glusterfs-fuse-3.12.13-1.el7.x86_64
>
> centos-release-gluster312-1.0-2.el7.centos.noarch
>
> glusterfs-rdma-3.12.13-1.el7.x86_64
>
> glusterfs-libs-3.12.13-1.el7.x86_64
>
> glusterfs-server-3.12.13-1.el7.x86_64
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
- Atin (atinm)
_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to