Added Rafi, Raghavendra who work on RDMA On Mon, Aug 8, 2016 at 7:58 AM, Dan Lavu <[email protected]> wrote:
> Hello, > > I'm having some major problems with Gluster and oVirt, I've been ripping > my hair out with this, so if anybody can provide insight, that will be > fantastic. I've tried both transports TCP and RDMA... both are having > instability problems. > > So the first thing I'm running into, intermittently, on one specific node, > will get spammed with the following message; > > "[2016-08-08 00:42:50.837992] E [rpc-clnt.c:357:saved_frames_unwind] (--> > /lib64/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7fb728b0f293] (--> > /lib64/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7fb7288d73d1] (--> > /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fb7288d74ee] (--> > /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e)[0x7fb7288d8d0e] > (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7fb7288d9528] ))))) > 0-vmdata1-client-0: forced unwinding frame type(GlusterFS 3.3) > op(WRITE(13)) called at 2016-08-08 00:42:43.620710 (xid=0x6800b)" > > Then the infiniband device will get bounced and VMs will get stuck. > > Another problem I'm seeing, once a day, or every two days, an oVirt node > will hang on gluster mounts. Issuing a df to check the mounts will just > stall, this occurs hourly if RDMA is used. I can log into the hypervisor > remount the gluster volumes most of the time. > > This is on Fedora 23; Gluster 3.8.1-1, the Infiniband gear is 40Gb/s QDR > Qlogic, using the ib_qib module, this configuration was working with our > old infinihost III. I couldn't get OFED to compile so all the infiniband > modules are Fedora installed. > > So a volume looks like the following, (please if there is anything I need > to adjust, the settings was pulled from several examples) > > Volume Name: vmdata_ha > Type: Replicate > Volume ID: 325a5fda-a491-4c40-8502-f89776a3c642 > Status: Started > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp,rdma > Bricks: > Brick1: deadpool.ib.runlevelone.lan:/gluster/vmdata_ha > Brick2: spidey.ib.runlevelone.lan:/gluster/vmdata_ha > Brick3: groot.ib.runlevelone.lan:/gluster/vmdata_ha (arbiter) > Options Reconfigured: > performance.least-prio-threads: 4 > performance.low-prio-threads: 16 > performance.normal-prio-threads: 24 > performance.high-prio-threads: 24 > cluster.self-heal-window-size: 32 > cluster.self-heal-daemon: on > performance.md-cache-timeout: 1 > performance.cache-max-file-size: 2MB > performance.io-thread-count: 32 > network.ping-timeout: 5 > performance.write-behind-window-size: 4MB > performance.cache-size: 256MB > performance.cache-refresh-timeout: 10 > server.allow-insecure: on > network.remote-dio: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > storage.owner-gid: 36 > storage.owner-uid: 36 > performance.readdir-ahead: on > nfs.disable: on > config.transport: tcp,rdma > performance.stat-prefetch: off > cluster.eager-lock: enable > > Volume Name: vmdata1 > Type: Distribute > Volume ID: 3afefcb3-887c-4315-b9dc-f4e890f786eb > Status: Started > Number of Bricks: 2 > Transport-type: tcp,rdma > Bricks: > Brick1: spidey.ib.runlevelone.lan:/gluster/vmdata1 > Brick2: deadpool.ib.runlevelone.lan:/gluster/vmdata1 > Options Reconfigured: > config.transport: tcp,rdma > network.remote-dio: enable > performance.io-cache: off > performance.read-ahead: off > performance.quick-read: off > nfs.disable: on > storage.owner-gid: 36 > storage.owner-uid: 36 > performance.readdir-ahead: on > server.allow-insecure: on > performance.stat-prefetch: off > performance.cache-refresh-timeout: 10 > performance.cache-size: 256MB > performance.write-behind-window-size: 4MB > network.ping-timeout: 5 > performance.io-thread-count: 32 > performance.cache-max-file-size: 2MB > performance.md-cache-timeout: 1 > performance.high-prio-threads: 24 > performance.normal-prio-threads: 24 > performance.low-prio-threads: 16 > performance.least-prio-threads: 4 > > > /etc/glusterfs/glusterd.vol > volume management > type mgmt/glusterd > option working-directory /var/lib/glusterd > option transport-type socket,tcp > option transport.socket.keepalive-time 10 > option transport.socket.keepalive-interval 2 > option transport.socket.read-fail-log off > option ping-timeout 0 > option event-threads 1 > # option rpc-auth-allow-insecure on > option transport.socket.bind-address 0.0.0.0 > # option transport.address-family inet6 > # option base-port 49152 > end-volume > > I think that's a good start, thank you so much for taking the time to look > at this. You can find me on freenode, nick side_control if you want to > chat, I'm GMT -5. > > Cheers, > > Dan > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://www.gluster.org/mailman/listinfo/gluster-users > -- Pranith
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
