Dennis, It seems like that add-brick has definitely failed and the entry is not committed into glusterd store. volume status and volume info commands are referring the in-memory data for fs4 (which exist) but post a restart they are no longer available. Could you run glusterd with debug log enabled (systemctl stop glusterd; glusterd -LDEBUG) and provide us cmd_history.log, glusterd log along with fs4 brick log files to further analyze the issue? Regarding the missing RDMA ports for fs2, fs3 brick can you cross check if glusterfs-rdma package is installed on both the nodes?
On Wed, Sep 28, 2016 at 7:14 AM, Ravishankar N <[email protected]> wrote: > On 09/27/2016 10:29 PM, Dennis Michael wrote: > > > > [root@fs4 bricks]# gluster volume info > > Volume Name: cees-data > Type: Distribute > Volume ID: 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2 > Status: Started > Number of Bricks: 4 > Transport-type: tcp,rdma > Bricks: > Brick1: fs1:/data/brick > Brick2: fs2:/data/brick > Brick3: fs3:/data/brick > Brick4: fs4:/data/brick > Options Reconfigured: > features.quota-deem-statfs: on > features.inode-quota: on > features.quota: on > performance.readdir-ahead: on > [root@fs4 bricks]# gluster volume status > Status of volume: cees-data > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick fs1:/data/brick 49152 49153 Y > 1878 > Brick fs2:/data/brick 49152 0 Y > 1707 > Brick fs3:/data/brick 49152 0 Y > 4696 > Brick fs4:/data/brick N/A N/A N > N/A > NFS Server on localhost 2049 0 Y > 13808 > Quota Daemon on localhost N/A N/A Y > 13813 > NFS Server on fs1 2049 0 Y > 6722 > Quota Daemon on fs1 N/A N/A Y > 6730 > NFS Server on fs3 2049 0 Y > 12553 > Quota Daemon on fs3 N/A N/A Y > 12561 > NFS Server on fs2 2049 0 Y > 11702 > Quota Daemon on fs2 N/A N/A Y > 11710 > > Task Status of Volume cees-data > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > [root@fs4 bricks]# ps auxww | grep gluster > root 13791 0.0 0.0 701472 19768 ? Ssl 09:06 0:00 > /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO > root 13808 0.0 0.0 560236 41420 ? Ssl 09:07 0:00 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S > /var/run/gluster/01c61523374369658a62b75c582b5ac2.socket > root 13813 0.0 0.0 443164 17908 ? Ssl 09:07 0:00 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p > /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log > -S /var/run/gluster/3753def90f5c34f656513dba6a544f7d.socket > --xlator-option *replicate*.data-self-heal=off --xlator-option > *replicate*.metadata-self-heal=off --xlator-option > *replicate*.entry-self-heal=off > root 13874 0.0 0.0 1200472 31700 ? Ssl 09:16 0:00 > /usr/sbin/glusterfsd -s fs4 --volfile-id cees-data.fs4.data-brick -p > /var/lib/glusterd/vols/cees-data/run/fs4-data-brick.pid -S > /var/run/gluster/5203ab38be21e1d37c04f6bdfee77d4a.socket --brick-name > /data/brick -l /var/log/glusterfs/bricks/data-brick.log --xlator-option > *-posix.glusterd-uuid=f04b231e-63f8-4374-91ae-17c0c623f165 --brick-port > 49152 49153 --xlator-option cees-data-server.transport.rdma.listen-port=49153 > --xlator-option cees-data-server.listen-port=49152 > --volfile-server-transport=socket,rdma > root 13941 0.0 0.0 112648 976 pts/0 S+ 09:50 0:00 grep > --color=auto gluster > > [root@fs4 bricks]# systemctl restart glusterfsd glusterd > > [root@fs4 bricks]# ps auxww | grep gluster > root 13808 0.0 0.0 560236 41420 ? Ssl 09:07 0:00 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p > /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S > /var/run/gluster/01c61523374369658a62b75c582b5ac2.socket > root 13813 0.0 0.0 443164 17908 ? Ssl 09:07 0:00 > /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p > /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log > -S /var/run/gluster/3753def90f5c34f656513dba6a544f7d.socket > --xlator-option *replicate*.data-self-heal=off --xlator-option > *replicate*.metadata-self-heal=off --xlator-option > *replicate*.entry-self-heal=off > root 13953 0.1 0.0 570740 14988 ? Ssl 09:51 0:00 > /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO > root 13965 0.0 0.0 112648 976 pts/0 S+ 09:51 0:00 grep > --color=auto gluster > > [root@fs4 bricks]# gluster volume info > > Volume Name: cees-data > Type: Distribute > Volume ID: 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2 > Status: Started > Number of Bricks: 3 > Transport-type: tcp,rdma > Bricks: > Brick1: fs1:/data/brick > Brick2: fs2:/data/brick > Brick3: fs3:/data/brick > Options Reconfigured: > performance.readdir-ahead: on > features.quota: on > features.inode-quota: on > features.quota-deem-statfs: on > > > > I'm not sure what's going on here. Restarting glusterd seems to change the > output of gluster volume info? I also see you are using RDMA. Not sure why > the RDMA ports for fs2 and fs3 are not shown in the volume status output. > CC'ing some glusterd/rdma devs for pointers. > > -Ravi > > > > [root@fs4 bricks]# gluster volume status > Status of volume: cees-data > Gluster process TCP Port RDMA Port Online > Pid > ------------------------------------------------------------ > ------------------ > Brick fs1:/data/brick 49152 49153 Y > 1878 > Brick fs2:/data/brick 49152 0 Y > 1707 > Brick fs3:/data/brick 49152 0 Y > 4696 > NFS Server on localhost 2049 0 Y > 13968 > Quota Daemon on localhost N/A N/A Y > 13976 > NFS Server on fs2 2049 0 Y > 11702 > Quota Daemon on fs2 N/A N/A Y > 11710 > NFS Server on fs3 2049 0 Y > 12553 > Quota Daemon on fs3 N/A N/A Y > 12561 > NFS Server on fs1 2049 0 Y > 6722 > > Task Status of Volume cees-data > ------------------------------------------------------------ > ------------------ > There are no active volume tasks > > [root@fs4 bricks]# gluster peer status > Number of Peers: 3 > > Hostname: fs1 > Uuid: ddc0a23e-05e5-48f7-993e-a37e43b21605 > State: Peer in Cluster (Connected) > > Hostname: fs2 > Uuid: e37108f8-d2f1-4f28-adc8-0b3d3401df29 > State: Peer in Cluster (Connected) > > Hostname: fs3 > Uuid: 19a42201-c932-44db-b1a7-8b5b1af32a36 > State: Peer in Cluster (Connected) > > Dennis > > > On Tue, Sep 27, 2016 at 9:40 AM, Ravishankar N <[email protected]> > wrote: > >> On 09/27/2016 09:53 PM, Dennis Michael wrote: >> >> Yes, you are right. I mixed up the logs. I just ran the add-brick >> command again after cleaning up fs4 and re-installing gluster. This is the >> complete fs4 data-brick.log. >> >> [root@fs1 ~]# gluster volume add-brick cees-data fs4:/data/brick >> volume add-brick: failed: Commit failed on fs4. Please check log file for >> details. >> >> [root@fs4 bricks]# pwd >> /var/log/glusterfs/bricks >> [root@fs4 bricks]# cat data-brick.log >> [2016-09-27 16:16:28.095661] I [MSGID: 100030] [glusterfsd.c:2338:main] >> 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.7.14 >> (args: /usr/sbin/glusterfsd -s fs4 --volfile-id cees-data.fs4.data-brick -p >> /var/lib/glusterd/vols/cees-data/run/fs4-data-brick.pid -S >> /var/run/gluster/5203ab38be21e1d37c04f6bdfee77d4a.socket --brick-name >> /data/brick -l /var/log/glusterfs/bricks/data-brick.log --xlator-option >> *-posix.glusterd-uuid=f04b231e-63f8-4374-91ae-17c0c623f165 --brick-port >> 49152 --xlator-option cees-data-server.transport.rdma.listen-port=49153 >> --xlator-option cees-data-server.listen-port=49152 >> --volfile-server-transport=socket,rdma) >> [2016-09-27 16:16:28.101547] I [MSGID: 101190] >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 1 >> [2016-09-27 16:16:28.104637] I [graph.c:269:gf_add_cmdline_options] >> 0-cees-data-server: adding option 'listen-port' for volume >> 'cees-data-server' with value '49152' >> [2016-09-27 16:16:28.104646] I [graph.c:269:gf_add_cmdline_options] >> 0-cees-data-server: adding option 'transport.rdma.listen-port' for volume >> 'cees-data-server' with value '49153' >> [2016-09-27 16:16:28.104662] I [graph.c:269:gf_add_cmdline_options] >> 0-cees-data-posix: adding option 'glusterd-uuid' for volume >> 'cees-data-posix' with value 'f04b231e-63f8-4374-91ae-17c0c623f165' >> [2016-09-27 16:16:28.104808] I [MSGID: 115034] >> [server.c:403:_check_for_auth_option] 0-/data/brick: skip format check >> for non-addr auth option auth.login./data/brick.allow >> [2016-09-27 16:16:28.104814] I [MSGID: 115034] >> [server.c:403:_check_for_auth_option] 0-/data/brick: skip format check >> for non-addr auth option auth.login.18ddaf4c-ad98-4155- >> 9372-717eae718b4c.password >> [2016-09-27 16:16:28.104883] I [MSGID: 101190] >> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread >> with index 2 >> [2016-09-27 16:16:28.105479] I >> [rpcsvc.c:2196:rpcsvc_set_outstanding_rpc_limit] >> 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64 >> [2016-09-27 16:16:28.105532] W [MSGID: 101002] >> [options.c:957:xl_opt_validate] 0-cees-data-server: option 'listen-port' >> is deprecated, preferred is 'transport.socket.listen-port', continuing >> with correction >> [2016-09-27 16:16:28.109456] W [socket.c:3665:reconfigure] >> 0-cees-data-quota: NBIO on -1 failed (Bad file descriptor) >> [2016-09-27 16:16:28.489255] I [MSGID: 121050] >> [ctr-helper.c:259:extract_ctr_options] 0-gfdbdatastore: CTR Xlator is >> disabled. >> [2016-09-27 16:16:28.489272] W [MSGID: 101105] >> [gfdb_sqlite3.h:239:gfdb_set_sql_params] 0-cees-data-changetimerecorder: >> Failed to retrieve sql-db-pagesize from params.Assigning default value: 4096 >> [2016-09-27 16:16:28.489278] W [MSGID: 101105] >> [gfdb_sqlite3.h:239:gfdb_set_sql_params] 0-cees-data-changetimerecorder: >> Failed to retrieve sql-db-journalmode from params.Assigning default value: >> wal >> [2016-09-27 16:16:28.489284] W [MSGID: 101105] >> [gfdb_sqlite3.h:239:gfdb_set_sql_params] 0-cees-data-changetimerecorder: >> Failed to retrieve sql-db-sync from params.Assigning default value: off >> [2016-09-27 16:16:28.489288] W [MSGID: 101105] >> [gfdb_sqlite3.h:239:gfdb_set_sql_params] 0-cees-data-changetimerecorder: >> Failed to retrieve sql-db-autovacuum from params.Assigning default value: >> none >> [2016-09-27 16:16:28.490431] I [trash.c:2412:init] 0-cees-data-trash: no >> option specified for 'eliminate', using NULL >> [2016-09-27 16:16:28.672814] W [graph.c:357:_log_if_unknown_option] >> 0-cees-data-server: option 'rpc-auth.auth-glusterfs' is not recognized >> [2016-09-27 16:16:28.672854] W [graph.c:357:_log_if_unknown_option] >> 0-cees-data-server: option 'rpc-auth.auth-unix' is not recognized >> [2016-09-27 16:16:28.672872] W [graph.c:357:_log_if_unknown_option] >> 0-cees-data-server: option 'rpc-auth.auth-null' is not recognized >> [2016-09-27 16:16:28.672924] W [graph.c:357:_log_if_unknown_option] >> 0-cees-data-quota: option 'timeout' is not recognized >> [2016-09-27 16:16:28.672955] W [graph.c:357:_log_if_unknown_option] >> 0-cees-data-trash: option 'brick-path' is not recognized >> Final graph: >> +----------------------------------------------------------- >> -------------------+ >> 1: volume cees-data-posix >> 2: type storage/posix >> 3: option glusterd-uuid f04b231e-63f8-4374-91ae-17c0c623f165 >> 4: option directory /data/brick >> 5: option volume-id 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2 >> 6: option update-link-count-parent on >> 7: end-volume >> 8: >> 9: volume cees-data-trash >> 10: type features/trash >> 11: option trash-dir .trashcan >> 12: option brick-path /data/brick >> 13: option trash-internal-op off >> 14: subvolumes cees-data-posix >> 15: end-volume >> 16: >> 17: volume cees-data-changetimerecorder >> 18: type features/changetimerecorder >> 19: option db-type sqlite3 >> 20: option hot-brick off >> 21: option db-name brick.db >> 22: option db-path /data/brick/.glusterfs/ >> 23: option record-exit off >> 24: option ctr_link_consistency off >> 25: option ctr_lookupheal_link_timeout 300 >> 26: option ctr_lookupheal_inode_timeout 300 >> 27: option record-entry on >> 28: option ctr-enabled off >> 29: option record-counters off >> 30: option ctr-record-metadata-heat off >> 31: option sql-db-cachesize 1000 >> 32: option sql-db-wal-autocheckpoint 1000 >> 33: subvolumes cees-data-trash >> 34: end-volume >> 35: >> 36: volume cees-data-changelog >> 37: type features/changelog >> 38: option changelog-brick /data/brick >> 39: option changelog-dir /data/brick/.glusterfs/changelogs >> 40: option changelog-barrier-timeout 120 >> 41: subvolumes cees-data-changetimerecorder >> 42: end-volume >> 43: >> 44: volume cees-data-bitrot-stub >> 45: type features/bitrot-stub >> 46: option export /data/brick >> 47: subvolumes cees-data-changelog >> 48: end-volume >> 49: >> 50: volume cees-data-access-control >> 51: type features/access-control >> 52: subvolumes cees-data-bitrot-stub >> 53: end-volume >> 54: >> 55: volume cees-data-locks >> 56: type features/locks >> 57: subvolumes cees-data-access-control >> 58: end-volume >> 59: >> 60: volume cees-data-upcall >> 61: type features/upcall >> 62: option cache-invalidation off >> 63: subvolumes cees-data-locks >> 64: end-volume >> 65: >> 66: volume cees-data-io-threads >> 67: type performance/io-threads >> 68: subvolumes cees-data-upcall >> 69: end-volume >> 70: >> 71: volume cees-data-marker >> 72: type features/marker >> 73: option volume-uuid 27d2a59c-bdac-4f66-bcd8-e6124e53a4a2 >> 74: option timestamp-file /var/lib/glusterd/vols/cees-da >> ta/marker.tstamp >> 75: option quota-version 1 >> 76: option xtime off >> 77: option gsync-force-xtime off >> 78: option quota on >> 79: option inode-quota on >> 80: subvolumes cees-data-io-threads >> 81: end-volume >> 82: >> 83: volume cees-data-barrier >> 84: type features/barrier >> 85: option barrier disable >> 86: option barrier-timeout 120 >> 87: subvolumes cees-data-marker >> 88: end-volume >> 89: >> 90: volume cees-data-index >> 91: type features/index >> 92: option index-base /data/brick/.glusterfs/indices >> 93: subvolumes cees-data-barrier >> 94: end-volume >> 95: >> 96: volume cees-data-quota >> 97: type features/quota >> 98: option transport.socket.connect-path >> /var/run/gluster/quotad.socket >> 99: option transport-type socket >> 100: option transport.address-family unix >> 101: option volume-uuid cees-data >> 102: option server-quota on >> 103: option timeout 0 >> 104: option deem-statfs on >> 105: subvolumes cees-data-index >> 106: end-volume >> 107: >> 108: volume cees-data-worm >> 109: type features/worm >> 110: option worm off >> 111: subvolumes cees-data-quota >> 112: end-volume >> 113: >> 114: volume cees-data-read-only >> 115: type features/read-only >> 116: option read-only off >> 117: subvolumes cees-data-worm >> 118: end-volume >> 119: >> 120: volume /data/brick >> 121: type debug/io-stats >> 122: option log-level INFO >> 123: option latency-measurement off >> 124: option count-fop-hits off >> 125: subvolumes cees-data-read-only >> 126: end-volume >> 127: >> 128: volume cees-data-server >> 129: type protocol/server >> 130: option transport.socket.listen-port 49152 >> 131: option rpc-auth.auth-glusterfs on >> 132: option rpc-auth.auth-unix on >> 133: option rpc-auth.auth-null on >> 134: option rpc-auth-allow-insecure on >> 135: option transport.rdma.listen-port 49153 >> 136: option transport-type tcp,rdma >> 137: option auth.login./data/brick.allow >> 18ddaf4c-ad98-4155-9372-717eae718b4c >> 138: option auth.login.18ddaf4c-ad98-4155-9372-717eae718b4c.password >> 9e913e92-7de0-47f9-94ed-d08cbb130d23 >> 139: option auth.addr./data/brick.allow * >> 140: subvolumes /data/brick >> 141: end-volume >> 142: >> +----------------------------------------------------------- >> -------------------+ >> [2016-09-27 16:16:30.079541] I [login.c:81:gf_auth] 0-auth/login: allowed >> user names: 18ddaf4c-ad98-4155-9372-717eae718b4c >> [2016-09-27 16:16:30.079567] I [MSGID: 115029] >> [server-handshake.c:690:server_setvolume] 0-cees-data-server: accepted >> client from fs3-12560-2016/09/27-16:16:30:47674-cees-data-client-3-0-0 >> (version: 3.7.14) >> [2016-09-27 16:16:30.081487] I [login.c:81:gf_auth] 0-auth/login: allowed >> user names: 18ddaf4c-ad98-4155-9372-717eae718b4c >> [2016-09-27 16:16:30.081505] I [MSGID: 115029] >> [server-handshake.c:690:server_setvolume] 0-cees-data-server: accepted >> client from fs2-11709-2016/09/27-16:16:30:50047-cees-data-client-3-0-0 >> (version: 3.7.14) >> [2016-09-27 16:16:30.111091] I [login.c:81:gf_auth] 0-auth/login: allowed >> user names: 18ddaf4c-ad98-4155-9372-717eae718b4c >> [2016-09-27 16:16:30.111113] I [MSGID: 115029] >> [server-handshake.c:690:server_setvolume] 0-cees-data-server: accepted >> client from fs2-11701-2016/09/27-16:16:29:24060-cees-data-client-3-0-0 >> (version: 3.7.14) >> [2016-09-27 16:16:30.112822] I [login.c:81:gf_auth] 0-auth/login: allowed >> user names: 18ddaf4c-ad98-4155-9372-717eae718b4c >> [2016-09-27 16:16:30.112836] I [MSGID: 115029] >> [server-handshake.c:690:server_setvolume] 0-cees-data-server: accepted >> client from fs3-12552-2016/09/27-16:16:29:23041-cees-data-client-3-0-0 >> (version: 3.7.14) >> [2016-09-27 16:16:31.950978] I [login.c:81:gf_auth] 0-auth/login: allowed >> user names: 18ddaf4c-ad98-4155-9372-717eae718b4c >> [2016-09-27 16:16:31.950998] I [MSGID: 115029] >> [server-handshake.c:690:server_setvolume] 0-cees-data-server: accepted >> client from fs1-6721-2016/09/27-16:16:26:939991-cees-data-client-3-0-0 >> (version: 3.7.14) >> [2016-09-27 16:16:31.981977] I [login.c:81:gf_auth] 0-auth/login: allowed >> user names: 18ddaf4c-ad98-4155-9372-717eae718b4c >> [2016-09-27 16:16:31.981994] I [MSGID: 115029] >> [server-handshake.c:690:server_setvolume] 0-cees-data-server: accepted >> client from fs1-6729-2016/09/27-16:16:27:971228-cees-data-client-3-0-0 >> (version: 3.7.14) >> >> >> Hmm, this shows the brick has started. >> Does gluster volume info on fs4 shows all 4 bricks? (I guess it does >> based on your first email). >> Does gluster volume status on fs4 (or ps aux|grep glusterfsd) show the >> brick as running? >> Does gluster peer status on all nodes list the other 3 nodes as >> connected? >> >> If yes, you could try `service glusterd restart` on fs4 and see if if >> brings up the brick? I'm just shooting in the dark here for possible clues. >> -Ravi >> >> On Tue, Sep 27, 2016 at 8:46 AM, Ravishankar N <[email protected]> >> wrote: >> >>> On 09/27/2016 09:06 PM, Dennis Michael wrote: >>> >>> Yes, the brick log /var/log/glusterfs/bricks/data-brick.log is created >>> on fs4, and the snippets showing the errors were from that log. >>> >>> Unless I'm missing something, the snippet below is from glusterd's log >>> and not the brick's as is evident from the function names. >>> -Ravi >>> >>> Dennis >>> >>> On Mon, Sep 26, 2016 at 5:58 PM, Ravishankar N <[email protected]> >>> wrote: >>> >>>> On 09/27/2016 05:25 AM, Dennis Michael wrote: >>>> >>>>> [2016-09-26 22:44:39.254921] E [MSGID: 106005] >>>>> [glusterd-utils.c:4771:glusterd_brick_start] 0-management: Unable to >>>>> start brick fs4:/data/brick >>>>> [2016-09-26 22:44:39.254949] E [MSGID: 106074] >>>>> [glusterd-brick-ops.c:2372:glusterd_op_add_brick] 0-glusterd: Unable >>>>> to add bricks >>>>> >>>> >>>> Is the brick log created on fs4? Does it contain warnings/errors? >>>> >>>> -Ravi >>>> >>>> >>> >>> >> >> > > -- --Atin
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
