Re: [Gluster-users] Is transport=rdma tested with "stripe"?

Raghavendra Talur Fri, 18 Aug 2017 01:20:12 -0700

On Wed, Aug 16, 2017 at 4:44 PM, Hatazaki, Takao <[email protected]> wrote:
>> Note that "stripe" is not tested much and practically unmaintained.
>
> Ah, this was what I suspected.  Understood.  I'll be happy with "shard".
>
> Having said that, "stripe" works fine with transport=tcp.  The failure 
> reproduces with just 2 RDMA servers (with InfiniBand), one of those acts also 
> as a client.
>
> I looked into logs.  I paste lengthy logs below with hoping mail systems not 
> automatically fold lines...
>
> Takao
>
> ---
>
> Immediately started the "gluster" interactive command, the following appeared 
> in cli.log.  The last line repeats at every 3 seconds.
>
> [2017-08-16 10:49:00.028789] I [cli.c:759:main] 0-cli: Started running 
> gluster with version 3.10.3
> [2017-08-16 10:49:00.032509] I 
> [cli-cmd-volume.c:2320:cli_check_gsync_present] 0-: geo-replication not 
> installed
> [2017-08-16 10:49:00.033038] I [MSGID: 101190] 
> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with 
> index 1
> [2017-08-16 10:49:00.033092] I [socket.c:2415:socket_event_handler] 
> 0-transport: EPOLLERR - disconnecting now
> [2017-08-16 10:49:03.032434] I [socket.c:2415:socket_event_handler] 
> 0-transport: EPOLLERR - disconnecting now
>
> When I do:
>
> gluster> volume create gv0 stripe 2 transport rdma 
> gluster-s1-fdr:/data/brick1/gv0 gluster-s2-fdr:/data/brick1/gv0
> volume create: gv0: success: please start the volume to access data
> gluster> volume start gv0
> volume start: gv0: success
>
> The following appeared in glusterd.log.  Note the "E" flag on the last line.
>
> [2017-08-16 10:38:48.451329] I [MSGID: 106062] 
> [glusterd-volume-ops.c:2617:glusterd_op_start_volume] 0-management: Global 
> dict not present.
> [2017-08-16 10:38:48.751913] I [MSGID: 106143] 
> [glusterd-pmap.c:277:pmap_registry_bind] 0-pmap: adding brick 
> /data/brick1/gv0.rdma on port 49152
> [2017-08-16 10:38:48.752222] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 
> 0-management: setting frame-timeout to 600
> [2017-08-16 10:38:48.915868] I [rpc-clnt.c:1059:rpc_clnt_connection_init] 
> 0-snapd: setting frame-timeout to 600
> [2017-08-16 10:38:48.915977] I [MSGID: 106132] 
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
> [2017-08-16 10:38:48.916008] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: nfs service is 
> stopped
> [2017-08-16 10:38:48.916189] I [MSGID: 106132] 
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already 
> stopped
> [2017-08-16 10:38:48.916210] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: bitd service is 
> stopped
> [2017-08-16 10:38:48.916232] I [MSGID: 106132] 
> [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already 
> stopped
> [2017-08-16 10:38:48.916245] I [MSGID: 106568] 
> [glusterd-svc-mgmt.c:228:glusterd_svc_stop] 0-management: scrub service is 
> stopped
> [2017-08-16 10:38:49.392687] I [run.c:191:runner_log] 
> (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdbd7a) 
> [0x7fbb107e5d7a] 
> -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb83d) 
> [0x7fbb107e583d] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
> [0x7fbb1bc5c385] ) 0-management: Ran script: 
> /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh --volname=gv0 
> --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
> [2017-08-16 10:38:49.402177] E [run.c:191:runner_log] 
> (-->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdbd7a) 
> [0x7fbb107e5d7a] 
> -->/usr/lib64/glusterfs/3.10.3/xlator/mgmt/glusterd.so(+0xdb79b) 
> [0x7fbb107e579b] -->/lib64/libglusterfs.so.0(runner_log+0x115) 
> [0x7fbb1bc5c385] ) 0-management: Failed to execute script: 
> /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh --volname=gv0 
> --first=yes --version=1 --volume-op=start --gd-workdir=/var/lib/glusterd
>
> Looks like this was related to Samba that I do not use.  The same E error 
> happens even I use transport=tcp.  No error in brick logs.  Below is what was 
> written to data-brick1-gv0.log:
>
> [2017-08-16 10:59:24.127902] I [MSGID: 100030] [glusterfsd.c:2475:main] 
> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.10.3 
> (args: /usr/sbin/glusterfsd -s gluster-s1-fdr --volfile-id 
> gv0.gluster-s1-fdr.data-brick1-gv0 -p 
> /var/lib/glusterd/vols/gv0/run/gluster-s1-fdr-data-brick1-gv0.pid -S 
> /var/run/gluster/6b6de65a92fa07146541a9474ffa2fd2.socket --brick-name 
> /data/brick1/gv0 -l /var/log/glusterfs/bricks/data-brick1-gv0.log 
> --xlator-option *-posix.glusterd-uuid=5c750a8f-c45b-4a7e-af84-16c1999874b7 
> --brick-port 49152 --xlator-option gv0-server.listen-port=49152 
> --volfile-server-transport=rdma)
> [2017-08-16 10:59:24.134054] I [MSGID: 101190] 
> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with 
> index 1
> [2017-08-16 10:59:24.137118] I 
> [rpcsvc.c:2237:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured 
> rpc.outstanding-rpc-limit with value 64
> [2017-08-16 10:59:24.138384] W [MSGID: 101002] 
> [options.c:954:xl_opt_validate] 0-gv0-server: option 'listen-port' is 
> deprecated, preferred is 'transport.rdma.listen-port', continuing with 
> correction
> [2017-08-16 10:59:24.142207] I [MSGID: 121050] 
> [ctr-helper.c:259:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is 
> disabled.
> [2017-08-16 10:59:24.237783] I [trash.c:2493:init] 0-gv0-trash: no option 
> specified for 'eliminate', using NULL
> [2017-08-16 10:59:24.239129] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 
> 'rpc-auth.auth-glusterfs' is not recognized
> [2017-08-16 10:59:24.239189] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 
> 'rpc-auth.auth-unix' is not recognized
> [2017-08-16 10:59:24.239203] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 
> 'rpc-auth.auth-null' is not recognized
> [2017-08-16 10:59:24.239226] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-gv0-server: option 'auth-path' is not 
> recognized
> [2017-08-16 10:59:24.239235] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 
> 'auth.addr./data/brick1/gv0.allow' is not recognized
> [2017-08-16 10:59:24.239251] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 'auth-path' 
> is not recognized
> [2017-08-16 10:59:24.239257] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 
> 'auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password' is not recognized
> [2017-08-16 10:59:24.239263] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-/data/brick1/gv0: option 
> 'auth.login./data/brick1/gv0.allow' is not recognized
> [2017-08-16 10:59:24.239276] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-gv0-quota: option 'timeout' is not 
> recognized
> [2017-08-16 10:59:24.239311] W [MSGID: 101174] 
> [graph.c:361:_log_if_unknown_option] 0-gv0-trash: option 'brick-path' is not 
> recognized
> Final graph:
> +------------------------------------------------------------------------------+
>   1: volume gv0-posix
>   2:     type storage/posix
>   3:     option glusterd-uuid 5c750a8f-c45b-4a7e-af84-16c1999874b7
>   4:     option directory /data/brick1/gv0
>   5:     option volume-id 6491a59c-866f-4a1d-b21b-f894ea0e50cd
>   6: end-volume
>   7:
>   8: volume gv0-trash
>   9:     type features/trash
>  10:     option trash-dir .trashcan
>  11:     option brick-path /data/brick1/gv0
>  12:     option trash-internal-op off
>  13:     subvolumes gv0-posix
>  14: end-volume
>  15:
>  16: volume gv0-changetimerecorder
>  17:     type features/changetimerecorder
>  18:     option db-type sqlite3
>  19:     option hot-brick off
>  20:     option db-name gv0.db
>  21:     option db-path /data/brick1/gv0/.glusterfs/
>  22:     option record-exit off
>  23:     option ctr_link_consistency off
>  24:     option ctr_lookupheal_link_timeout 300
>  25:     option ctr_lookupheal_inode_timeout 300
>  26:     option record-entry on
>  27:     option ctr-enabled off
>  28:     option record-counters off
>  29:     option ctr-record-metadata-heat off
>  30:     option sql-db-cachesize 12500
>  31:     option sql-db-wal-autocheckpoint 25000
>  32:     subvolumes gv0-trash
>  33: end-volume
>  34:
>  35: volume gv0-changelog
>  36:     type features/changelog
>  37:     option changelog-brick /data/brick1/gv0
>  38:     option changelog-dir /data/brick1/gv0/.glusterfs/changelogs
>  39:     option changelog-barrier-timeout 120
>  40:     subvolumes gv0-changetimerecorder
>  41: end-volume
>  42:
>  43: volume gv0-bitrot-stub
>  44:     type features/bitrot-stub
>  45:     option export /data/brick1/gv0
>  46:     subvolumes gv0-changelog
>  47: end-volume
>  48:
>  49: volume gv0-access-control
>  50:     type features/access-control
>  51:     subvolumes gv0-bitrot-stub
>  52: end-volume
>  53:
>  54: volume gv0-locks
>  55:     type features/locks
>  56:     subvolumes gv0-access-control
>  57: end-volume
>  58:
>  59: volume gv0-worm
>  60:     type features/worm
>  61:     option worm off
>  62:     option worm-file-level off
>  63:     subvolumes gv0-locks
>  64: end-volume
>  65:
>  66: volume gv0-read-only
>  67:     type features/read-only
>  68:     option read-only off
>  69:     subvolumes gv0-worm
>  70: end-volume
>  71:
>  72: volume gv0-leases
>  73:     type features/leases
>  74:     option leases off
>  75:     subvolumes gv0-read-only
>  76: end-volume
>  77:
>  78: volume gv0-upcall
>  79:     type features/upcall
>  80:     option cache-invalidation off
>  81:     subvolumes gv0-leases
>  82: end-volume
>  83:
>  84: volume gv0-io-threads
>  85:     type performance/io-threads
>  86:     subvolumes gv0-upcall
>  87: end-volume
>  88:
>  89: volume gv0-marker
>  90:     type features/marker
>  91:     option volume-uuid 6491a59c-866f-4a1d-b21b-f894ea0e50cd
>  92:     option timestamp-file /var/lib/glusterd/vols/gv0/marker.tstamp
>  93:     option quota-version 0
>  94:     option xtime off
>  95:     option gsync-force-xtime off
>  96:     option quota off
>  97:     option inode-quota off
>  98:     subvolumes gv0-io-threads
>  99: end-volume
> 100:
> 101: volume gv0-barrier
> 102:     type features/barrier
> 103:     option barrier disable
> 104:     option barrier-timeout 120
> 105:     subvolumes gv0-marker
> 106: end-volume
> 107:
> 108: volume gv0-index
> 109:     type features/index
> 110:     option index-base /data/brick1/gv0/.glusterfs/indices
> 111:     subvolumes gv0-barrier
> 112: end-volume
> 113:
> 114: volume gv0-quota
> 115:     type features/quota
> 116:     option volume-uuid gv0
> 117:     option server-quota off
> 118:     option timeout 0
> 119:     option deem-statfs off
> 120:     subvolumes gv0-index
> 121: end-volume
> 122:
> 123: volume gv0-io-stats
> 124:     type debug/io-stats
> 125:     option unique-id /data/brick1/gv0
> 126:     option log-level INFO
> 127:     option latency-measurement off
> 128:     option count-fop-hits off
> 129:     subvolumes gv0-quota
> 130: end-volume
> 131:
> 132: volume /data/brick1/gv0
> 133:     type performance/decompounder
> 134:     option auth.addr./data/brick1/gv0.allow *
> 135:     option auth-path /data/brick1/gv0
> 136:     option auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password 
> e5fe5e7e-6722-4845-8149-edaf14065ac0
> 137:     option auth.login./data/brick1/gv0.allow 
> 2d6e8c76-47ed-4ac4-87ff-f96693f048b5
> 138:     subvolumes gv0-io-stats
> 139: end-volume
> 140:
> 141: volume gv0-server
> 142:     type protocol/server
> 143:     option transport.rdma.listen-port 49152
> 144:     option rpc-auth.auth-glusterfs on
> 145:     option rpc-auth.auth-unix on
> 146:     option rpc-auth.auth-null on
> 147:     option rpc-auth-allow-insecure on
> 148:     option transport-type rdma
> 149:     option auth.login./data/brick1/gv0.allow 
> 2d6e8c76-47ed-4ac4-87ff-f96693f048b5
> 150:     option auth.login.2d6e8c76-47ed-4ac4-87ff-f96693f048b5.password 
> e5fe5e7e-6722-4845-8149-edaf14065ac0
> 151:     option auth-path /data/brick1/gv0
> 152:     option auth.addr./data/brick1/gv0.allow *
> 153:     subvolumes /data/brick1/gv0
> 154: end-volume
> 155:
> +------------------------------------------------------------------------------+
>
> Anyway, gluster tells that the volume started successfully.
>
> gluster> volume info gv0
>
> Volume Name: gv0
> Type: Stripe
> Volume ID: 6491a59c-866f-4a1d-b21b-f894ea0e50cd
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: rdma
> Bricks:
> Brick1: gluster-s1-fdr:/data/brick1/gv0
> Brick2: gluster-s2-fdr:/data/brick1/gv0
> Options Reconfigured:
> nfs.disable: on
> gluster>
> gluster> volume status gv0
> Status of volume: gv0
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick gluster-s1-fdr:/data/brick1/gv0       0         49152      Y       2553
> Brick gluster-s2-fdr:/data/brick1/gv0       0         49152      Y       2580
>
> Task Status of Volume gv0
> ------------------------------------------------------------------------------
> There are no active volume tasks
>
> I proceed to mount.  I did:
>
> [root@gluster-s1 ~]# mount -t glusterfs glusterfs-s1-fdr:/gv0 /mnt
> Mount failed. Please check the log file for more details.
>
> The following was written to mnt.log:
>
> [2017-08-16 11:09:08.794585] I [MSGID: 100030] [glusterfsd.c:2475:main] 
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.3 
> (args: /usr/sbin/glusterfs --volfile-server=glusterfs-s1-fdr 
> --volfile-id=/gv0 /mnt)
> [2017-08-16 11:09:08.949784] E [MSGID: 101075] 
> [common-utils.c:307:gf_resolve_ip6] 0-resolver: getaddrinfo failed (unknown 
> name or service)
> [2017-08-16 11:09:08.949815] E 
> [name.c:262:af_inet_client_get_remote_sockaddr] 0-glusterfs: DNS resolution 
> failed on host glusterfs-s1-fdr
> [2017-08-16 11:09:08.949956] I [glusterfsd-mgmt.c:2134:mgmt_rpc_notify] 
> 0-glusterfsd-mgmt: disconnected from remote-host: glusterfs-s1-fdr
> [2017-08-16 11:09:08.950097] I [glusterfsd-mgmt.c:2155:mgmt_rpc_notify] 
> 0-glusterfsd-mgmt: Exhausted all volfile servers


This is the problem. To use rdma as transport, you need to have the
hostnames and IPs used in volume creation associated with rdma
interface. You seem to have dns and ips set to use another ethernet
nic on same node.

> [2017-08-16 11:09:08.950105] I [MSGID: 101190] 
> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with 
> index 1
> [2017-08-16 11:09:08.950277] W [glusterfsd.c:1332:cleanup_and_exit] 
> (-->/lib64/libgfrpc.so.0(rpc_clnt_notify+0xab) [0x7fdfa46bba2b] 
> -->/usr/sbin/glusterfs(+0x10afd) [0x7fdfa4df2afd] 
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fdfa4debe4b] ) 0-: received 
> signum (1), shutting down
> [2017-08-16 11:09:08.950326] I [fuse-bridge.c:5802:fini] 0-fuse: Unmounting 
> '/mnt'.
> [2017-08-16 11:09:08.950582] W [glusterfsd.c:1332:cleanup_and_exit] 
> (-->/lib64/libpthread.so.0(+0x7dc5) [0x7fdfa3752dc5] 
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7fdfa4dec025] 
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x7fdfa4debe4b] ) 0-: received 
> signum (15), shutting down
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is transport=rdma tested with "stripe"?

Reply via email to