Shaik, Sorry to ask this again. What errors are you seeing in glusterd logs? Can you share the latest logs?
On Thu, Jan 24, 2019 at 2:05 PM Shaik Salam <[email protected]> wrote: > Hi Sanju, > > Please find requsted information. > > Are you still seeing the error "Unable to read pidfile:" in glusterd log? > >>>> No > Are you seeing "brick is deemed not to be a part of the volume" error in > glusterd log?>>>> No > > sh-4.2# getfattr -m -d -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > sh-4.2# getfattr -m -d -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae1^C8ab7782dd57cf5b6c1/brick > sh-4.2# pwd > > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > sh-4.2# getfattr -m -d -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > sh-4.2# getfattr -m -d -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ > sh-4.2# getfattr -m -d -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ > sh-4.2# getfattr -m -d -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ > sh-4.2# getfattr -d -m . -e hex > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ > getfattr: Removing leading '/' from absolute path names > # file: > var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick/ > > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.dirty=0x000000000000000000000000 > > trusted.afr.vol_3442e86b6d994a14de73f1b8c82cf0b8-client-0=0x000000000000000000000000 > trusted.gfid=0x00000000000000000000000000000001 > trusted.glusterfs.dht=0x000000010000000000000000ffffffff > trusted.glusterfs.volume-id=0x15477f3622e84757a0ce9000b63fa849 > > sh-4.2# ls -la |wc -l > 86 > sh-4.2# pwd > > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > sh-4.2# > > > > From: "Sanju Rakonde" <[email protected]> > To: "Shaik Salam" <[email protected]> > Cc: "Amar Tumballi Suryanarayan" <[email protected]>, " > [email protected] List" <[email protected]>, "Murali > Kottakota" <[email protected]> > Date: 01/24/2019 01:38 PM > Subject: Re: [Gluster-users] [Bugs] Bricks are going offline > unable to recover with heal/start force commands > ------------------------------ > > > > *"External email. Open with Caution"* > Shaik, > > Previously I was suspecting, whether brick pid file is missing. But I see > it is present. > > From second node (this brick is in offline state): > > /var/run/gluster/vols/vol_3442e86b6d994a14de73f1b8c82cf0b8/192.168.3.5-var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.pid > 271 > Are you still seeing the error "Unable to read pidfile:" in glusterd log? > > I also suspect whether brick is missing its extended attributes. Are you > seeing "brick is deemed not to be a part of the volume" error in glusterd > log? If not can you please provide us output of "getfattr -m -d -e hex > <brickpath>" > > On Thu, Jan 24, 2019 at 12:18 PM Shaik Salam <*[email protected]* > <[email protected]>> wrote: > Hi Sanju, > > Could you please have look my issue if you have time (atleast provide > workaround). > > BR > Salam > > > > From: Shaik Salam/HYD/TCS > To: "Sanju Rakonde" <*[email protected]* <[email protected]>> > Cc: "Amar Tumballi Suryanarayan" <*[email protected]* > <[email protected]>>, "*[email protected]* > <[email protected]> List" <*[email protected]* > <[email protected]>>, "Murali Kottakota" < > *[email protected]* <[email protected]>> > Date: 01/23/2019 05:50 PM > Subject: Re: [Gluster-users] [Bugs] Bricks are going offline > unable to recover with heal/start force commands > ------------------------------ > > > > > Hi Sanju, > > Please find requested information. > > Sorry to repeat again I am trying start force command once brick log > enabled to debug by taking one volume example. > Please correct me If I am doing wrong. > > > [root@master ~]# oc rsh glusterfs-storage-vll7x > sh-4.2# gluster volume info vol_3442e86b6d994a14de73f1b8c82cf0b8 > > Volume Name: vol_3442e86b6d994a14de73f1b8c82cf0b8 > Type: Replicate > Volume ID: 15477f36-22e8-4757-a0ce-9000b63fa849 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 192.168.3.6: > /var/lib/heketi/mounts/vg_ca57f326195c243be2380ce4e42a4191/brick_952d75fd193c7209c9a81acbc23a3747/brick > Brick2: 192.168.3.5: > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/ > brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > Brick3: 192.168.3.15: > /var/lib/heketi/mounts/vg_462ea199185376b03e4b0317363bb88c/brick_1736459d19e8aaa1dcb5a87f48747d04/brick > Options Reconfigured: > diagnostics.brick-log-level: INFO > performance.client-io-threads: off > nfs.disable: on > transport.address-family: inet > sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 > Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 192.168.3.6:/var/lib/heketi/mounts/vg > _ca57f326195c243be2380ce4e42a4191/brick_952 > d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y > 250 > Brick 192.168.3.5:/var/lib/heketi/mounts/vg > _d5f17487744584e3652d3ca943b0b91b/brick_e15 > c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N > N/A > Brick 192.168.3.15:/var/lib/heketi/mounts/v > g_462ea199185376b03e4b0317363bb88c/brick_17 > 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y > 225 > Self-heal Daemon on localhost N/A N/A Y > 108434 > Self-heal Daemon on matrix1.matrix.orange.l > ab N/A N/A Y > 69525 > Self-heal Daemon on matrix2.matrix.orange.l > ab N/A N/A Y > 18569 > > gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 > diagnostics.brick-log-level DEBUG > volume set: success > sh-4.2# gluster volume get vol_3442e86b6d994a14de73f1b8c82cf0b8 all |grep > log > cluster.entry-change-log on > cluster.data-change-log on > cluster.metadata-change-log on > diagnostics.brick-log-level DEBUG > > sh-4.2# cd /var/log/glusterfs/bricks/ > sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1 > -rw-------. 1 root root 0 Jan 20 02:46 > > var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log > >>> Noting in log > > -rw-------. 1 root root 189057 Jan 18 09:20 > var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log-20190120 > > [2019-01-23 11:49:32.475956] I [run.c:241:runner_log] > (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) > [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) > [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/set/post/S30samba-set.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o > diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd > [2019-01-23 11:49:32.483191] I [run.c:241:runner_log] > (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) > [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) > [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/set/post/S32gluster_enable_shared_storage.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 -o > diagnostics.brick-log-level=DEBUG --gd-workdir=/var/lib/glusterd > [2019-01-23 11:48:59.111292] W [MSGID: 106036] > [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management: > Snapshot list failed > [2019-01-23 11:50:14.112271] E [MSGID: 106026] > [glusterd-snapshot.c:3962:glusterd_handle_snapshot_list] 0-management: > Volume (vol_63854b105c40802bdec77290e91858ea) does not exist [Invalid > argument] > [2019-01-23 11:50:14.112305] W [MSGID: 106036] > [glusterd-snapshot.c:9514:glusterd_handle_snapshot_fn] 0-management: > Snapshot list failed > [2019-01-23 11:50:20.322902] I > [glusterd-utils.c:5994:glusterd_brick_start] 0-management: discovered > already-running brick > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > [2019-01-23 11:50:20.322925] I [MSGID: 106142] > [glusterd-pmap.c:297:pmap_registry_bind] 0-pmap: adding brick > /var/lib/heketi/mounts/vg_d5f17487744584e3652d3ca943b0b91b/brick_e15c12cceae12c8ab7782dd57cf5b6c1/brick > on port 49165 > [2019-01-23 11:50:20.327557] I [MSGID: 106131] > [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already > stopped > [2019-01-23 11:50:20.327586] I [MSGID: 106568] > [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: nfs service is > stopped > [2019-01-23 11:50:20.327604] I [MSGID: 106599] > [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so > xlator is not installed > [2019-01-23 11:50:20.337735] I [MSGID: 106568] > [glusterd-proc-mgmt.c:87:glusterd_proc_stop] 0-management: Stopping > glustershd daemon running in pid: 69525 > [2019-01-23 11:50:21.338058] I [MSGID: 106568] > [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: glustershd > service is stopped > [2019-01-23 11:50:21.338180] I [MSGID: 106567] > [glusterd-svc-mgmt.c:203:glusterd_svc_start] 0-management: Starting > glustershd service > [2019-01-23 11:50:21.348234] I [MSGID: 106131] > [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already > stopped > [2019-01-23 11:50:21.348285] I [MSGID: 106568] > [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: bitd service is > stopped > [2019-01-23 11:50:21.348866] I [MSGID: 106131] > [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already > stopped > [2019-01-23 11:50:21.348883] I [MSGID: 106568] > [glusterd-svc-mgmt.c:235:glusterd_svc_stop] 0-management: scrub service is > stopped > [2019-01-23 11:50:22.356502] I [run.c:241:runner_log] > (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) > [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) > [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 > --volume-op=start --gd-workdir=/var/lib/glusterd > [2019-01-23 11:50:22.368845] E [run.c:241:runner_log] > (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) > [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) > [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Failed to execute script: > /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 > --volume-op=start --gd-workdir=/var/lib/glusterd > > > sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 > Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 192.168.3.6:/var/lib/heketi/mounts/vg > _ca57f326195c243be2380ce4e42a4191/brick_952 > d75fd193c7209c9a81acbc23a3747/brick 49157 0 Y > 250 > Brick 192.168.3.5:/var/lib/heketi/mounts/vg > _d5f17487744584e3652d3ca943b0b91b/brick_e15 > c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N > N/A > Brick 192.168.3.15:/var/lib/heketi/mounts/v > g_462ea199185376b03e4b0317363bb88c/brick_17 > 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y > 225 > Self-heal Daemon on localhost N/A N/A Y > 109550 > Self-heal Daemon on 192.168.3.6 N/A N/A Y > 52557 > Self-heal Daemon on 192.168.3.15 N/A N/A Y > 16946 > > Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8 > > ------------------------------------------------------------------------------ > There are no active volume tasks > > > > > From: "Sanju Rakonde" <*[email protected]* <[email protected]>> > To: "Shaik Salam" <*[email protected]* <[email protected]>> > Cc: "Amar Tumballi Suryanarayan" <*[email protected]* > <[email protected]>>, "*[email protected]* > <[email protected]> List" <*[email protected]* > <[email protected]>>, "Murali Kottakota" < > *[email protected]* <[email protected]>> > Date: 01/23/2019 02:15 PM > Subject: Re: [Gluster-users] [Bugs] Bricks are going offline > unable to recover with heal/start force commands > ------------------------------ > > > > * "External email. Open with Caution"* > Hi Shaik, > > I can see below errors in glusterd logs. > > [2019-01-22 09:20:17.540196] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/vols/vol_e1aa1283d5917485d88c4a742eeff422/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_9e7c382e5f853d471c347bc5590359af-brick.pid > > [2019-01-22 09:20:17.546408] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/vols/vol_f0ed498d7e781d7bb896244175b31f9e/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_47ed9e0663ad0f6f676ddd6ad7e3dcde-brick.pid > > [2019-01-22 09:20:17.552575] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/vols/vol_f387519c9b004ec14e80696db88ef0f8/192.168.3.6-var-lib-heketi-mounts-vg_56391bec3c8bfe4fc116de7bddfc2af4-brick_06ad6c73dfbf6a5fc21334f98c9973c2-brick.pid > > [2019-01-22 09:20:17.558888] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/vols/vol_f8ca343c60e6efe541fe02d16ca02a7d/192.168.3.6-var-lib-heketi-mounts-vg_526f35058433c6b03130bba4e0a7dd87-brick_525225f65753b05dfe33aeaeb9c5de39-brick.pid > > [2019-01-22 09:20:17.565266] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/vols/vol_fe882e074c0512fd9271fc2ff5a0bfe1/192.168.3.6-var-lib-heketi-mounts-vg_28708570b029e5eff0a996c453a11691-brick_d4f30d6e465a8544b759a7016fb5aab5-brick.pid > > [2019-01-22 09:20:17.585926] E [MSGID: 106028] > [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid > of brick process > [2019-01-22 09:20:17.617806] E [MSGID: 106028] > [glusterd-utils.c:8222:glusterd_brick_signal] 0-glusterd: Unable to get pid > of brick process > [2019-01-22 09:20:17.649628] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/glustershd/glustershd.pid > [2019-01-22 09:20:17.649700] E [MSGID: 101012] > [common-utils.c:4010:gf_is_service_running] 0-: Unable to read pidfile: > /var/run/gluster/glustershd/glustershd.pid > > So it looks like, neither gf_is_service_running() > nor glusterd_brick_signal() are able to read the pid file. That means > pidfiles might be having nothing to read. > > Can you please paste the contents of brick pidfiles. You can find brick > pidfiles in /var/run/gluster/vols/<volname>/ or you can just run this > command "for i in `ls /var/run/gluster/vols/*/*.pid`;do echo $i;cat > $i;done" > > On Wed, Jan 23, 2019 at 12:49 PM Shaik Salam <*[email protected]* > <[email protected]>> wrote: > Hi Sanju, > > Please find requested information attached logs. > > > > > Below brick is offline and try to start force/heal commands but doesn't > makes up. > > sh-4.2# > sh-4.2# gluster --version > glusterfs 4.1.5 > > > sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 > Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 192.168.3.6:/var/lib/heketi/mounts/vg > _ca57f326195c243be2380ce4e42a4191/brick_952 > d75fd193c7209c9a81acbc23a3747/brick 49166 0 Y > 269 > Brick 192.168.3.5:/var/lib/heketi/mounts/vg > _d5f17487744584e3652d3ca943b0b91b/brick_e15 > c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N > N/A > Brick 192.168.3.15:/var/lib/heketi/mounts/v > g_462ea199185376b03e4b0317363bb88c/brick_17 > 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y > 225 > Self-heal Daemon on localhost N/A N/A Y > 45826 > Self-heal Daemon on 192.168.3.6 N/A N/A Y > 65196 > Self-heal Daemon on 192.168.3.15 N/A N/A Y > 52915 > > Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8 > > ------------------------------------------------------------------------------ > > > We can see following events from when we start forcing volumes > > /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) > [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 > --volume-op=start --gd-workdir=/var/lib/glusterd > [2019-01-21 08:22:34.555068] E [run.c:241:runner_log] > (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) > [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) > [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Failed to execute script: > /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 > --volume-op=start --gd-workdir=/var/lib/glusterd > [2019-01-21 08:22:53.389049] I [MSGID: 106499] > [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8 > [2019-01-21 08:23:25.346839] I [MSGID: 106487] > [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: > Received cli list req > > > We can see following events from when we heal volumes. > > [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] > 0-cli: Received resp to heal volume > [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1 > [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running > gluster with version 4.1.5 > [2019-01-21 08:22:30.463648] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:22:34.581555] I > [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start > volume > [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0 > [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running > gluster with version 4.1.5 > [2019-01-21 08:22:53.387992] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0 > [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running > gluster with version 4.1.5 > [2019-01-21 08:23:25.346319] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > > > Enabled DEBUG mode for brick level. But nothing writing to brick log. > > gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 > diagnostics.brick-log-level DEBUG > > sh-4.2# pwd > /var/log/glusterfs/bricks > > sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1 > -rw-------. 1 root root 0 Jan 20 02:46 > var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log > > > > > > > From: Sanju Rakonde <*[email protected]* <[email protected]>> > To: Shaik Salam <*[email protected]* <[email protected]>> > Cc: Amar Tumballi Suryanarayan <*[email protected]* > <[email protected]>>, "*[email protected]* > <[email protected]> List" <*[email protected]* > <[email protected]>> > Date: 01/22/2019 02:21 PM > Subject: Re: [Gluster-users] [Bugs] Bricks are going offline > unable to recover with heal/start force commands > ------------------------------ > > > > * "External email. Open with Caution"* > Hi Shaik, > > Can you please provide us complete glusterd and cmd_history logs from all > the nodes in the cluster? Also please paste output of the following > commands (from all nodes): > 1. gluster --version > 2. gluster volume info > 3. gluster volume status > 4. gluster peer status > 5. ps -ax | grep glusterfsd > > On Tue, Jan 22, 2019 at 12:47 PM Shaik Salam <*[email protected]* > <[email protected]>> wrote: > Hi Surya, > > It is already customer setup and cant redeploy again. > Enabled debug for brick level log but nothing writing to it. > Can you tell me is any other ways to troubleshoot or logs to look?? > > > From: Shaik Salam/HYD/TCS > To: "Amar Tumballi Suryanarayan" <*[email protected]* > <[email protected]>> > Cc: "*[email protected]* <[email protected]> List" > <*[email protected]* <[email protected]>> > Date: 01/22/2019 12:06 PM > Subject: Re: [Bugs] Bricks are going offline unable to recover > with heal/start force commands > ------------------------------ > > > Hi Surya, > > I have enabled DEBUG mode for brick level. But nothing writing to brick > log. > > gluster volume set vol_3442e86b6d994a14de73f1b8c82cf0b8 > diagnostics.brick-log-level DEBUG > > sh-4.2# pwd > /var/log/glusterfs/bricks > > sh-4.2# ls -la |grep brick_e15c12cceae12c8ab7782dd57cf5b6c1 > -rw-------. 1 root root 0 Jan 20 02:46 > var-lib-heketi-mounts-vg_d5f17487744584e3652d3ca943b0b91b-brick_e15c12cceae12c8ab7782dd57cf5b6c1-brick.log > > BR > Salam > > > > > From: "Amar Tumballi Suryanarayan" <*[email protected]* > <[email protected]>> > To: "Shaik Salam" <*[email protected]* <[email protected]>> > Cc: "*[email protected]* <[email protected]> List" > <*[email protected]* <[email protected]>> > Date: 01/22/2019 11:38 AM > Subject: Re: [Bugs] Bricks are going offline unable to recover > with heal/start force commands > ------------------------------ > > > > * "External email. Open with Caution"* > Hi Shaik, > > Can you check what is there in brick logs? They are located in > /var/log/glusterfs/bricks/*? > > Looks like the samba hooks script failed, but that shouldn't matter in > this use case. > > Also, I see that you are trying to setup heketi to provision volumes, > which means you may be using gluster in container usecases. If you are > still in 'PoC' phase, can you give *https://github.com/gluster/gcs* > <https://github.com/gluster/gcs> a try? That makes the deployment and the > stack little simpler. > > -Amar > > > > > On Tue, Jan 22, 2019 at 11:29 AM Shaik Salam <*[email protected]* > <[email protected]>> wrote: > Can anyone respond how to recover bricks apart from heal/start force > according to below events from logs. > Please let me know any other logs required. > Thanks in advance. > > BR > Salam > > > > From: Shaik Salam/HYD/TCS > To: *[email protected]* <[email protected]>, > *[email protected]* <[email protected]> > Date: 01/21/2019 10:03 PM > Subject: Bricks are going offline unable to recover with > heal/start force commands > ------------------------------ > > > Hi, > > Bricks are in offline and unable to recover with following commands > > gluster volume heal <vol-name> > > gluster volume start <vol-name> force > > But still bricks are offline. > > > sh-4.2# gluster volume status vol_3442e86b6d994a14de73f1b8c82cf0b8 > Status of volume: vol_3442e86b6d994a14de73f1b8c82cf0b8 > Gluster process TCP Port RDMA Port Online > Pid > > ------------------------------------------------------------------------------ > Brick 192.168.3.6:/var/lib/heketi/mounts/vg > _ca57f326195c243be2380ce4e42a4191/brick_952 > d75fd193c7209c9a81acbc23a3747/brick 49166 0 Y > 269 > Brick 192.168.3.5:/var/lib/heketi/mounts/vg > _d5f17487744584e3652d3ca943b0b91b/brick_e15 > c12cceae12c8ab7782dd57cf5b6c1/brick N/A N/A N > N/A > Brick 192.168.3.15:/var/lib/heketi/mounts/v > g_462ea199185376b03e4b0317363bb88c/brick_17 > 36459d19e8aaa1dcb5a87f48747d04/brick 49173 0 Y > 225 > Self-heal Daemon on localhost N/A N/A Y > 45826 > Self-heal Daemon on 192.168.3.6 N/A N/A Y > 65196 > Self-heal Daemon on 192.168.3.15 N/A N/A Y > 52915 > > Task Status of Volume vol_3442e86b6d994a14de73f1b8c82cf0b8 > > ------------------------------------------------------------------------------ > > > We can see following events from when we start forcing volumes > > /mgmt/glusterd.so(+0xe2b3a) [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2605) > [0x7fca9e139605] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Ran script: > /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 > --volume-op=start --gd-workdir=/var/lib/glusterd > [2019-01-21 08:22:34.555068] E [run.c:241:runner_log] > (-->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2b3a) > [0x7fca9e139b3a] > -->/usr/lib64/glusterfs/4.1.5/xlator/mgmt/glusterd.so(+0xe2563) > [0x7fca9e139563] -->/lib64/libglusterfs.so.0(runner_log+0x115) > [0x7fcaa346f0e5] ) 0-management: Failed to execute script: > /var/lib/glusterd/hooks/1/start/post/S30samba-start.sh > --volname=vol_3442e86b6d994a14de73f1b8c82cf0b8 --first=no --version=1 > --volume-op=start --gd-workdir=/var/lib/glusterd > [2019-01-21 08:22:53.389049] I [MSGID: 106499] > [glusterd-handler.c:4314:__glusterd_handle_status_volume] 0-management: > Received status volume req for volume vol_3442e86b6d994a14de73f1b8c82cf0b8 > [2019-01-21 08:23:25.346839] I [MSGID: 106487] > [glusterd-handler.c:1486:__glusterd_handle_cli_list_friends] 0-glusterd: > Received cli list req > > > We can see following events from when we heal volumes. > > [2019-01-21 08:20:07.576070] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-01-21 08:20:07.580225] I [cli-rpc-ops.c:9182:gf_cli_heal_volume_cbk] > 0-cli: Received resp to heal volume > [2019-01-21 08:20:07.580326] I [input.c:31:cli_batch] 0-: Exiting with: -1 > [2019-01-21 08:22:30.423311] I [cli.c:768:main] 0-cli: Started running > gluster with version 4.1.5 > [2019-01-21 08:22:30.463648] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-21 08:22:30.463718] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:22:30.463859] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-01-21 08:22:33.427710] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:22:34.581555] I > [cli-rpc-ops.c:1472:gf_cli_start_volume_cbk] 0-cli: Received resp to start > volume > [2019-01-21 08:22:34.581678] I [input.c:31:cli_batch] 0-: Exiting with: 0 > [2019-01-21 08:22:53.345351] I [cli.c:768:main] 0-cli: Started running > gluster with version 4.1.5 > [2019-01-21 08:22:53.387992] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-21 08:22:53.388059] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:22:53.388138] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > [2019-01-21 08:22:53.394737] I [input.c:31:cli_batch] 0-: Exiting with: 0 > [2019-01-21 08:23:25.304688] I [cli.c:768:main] 0-cli: Started running > gluster with version 4.1.5 > [2019-01-21 08:23:25.346319] I [MSGID: 101190] > [event-epoll.c:617:event_dispatch_epoll_worker] 0-epoll: Started thread > with index 1 > [2019-01-21 08:23:25.346389] I [socket.c:2632:socket_event_handler] > 0-transport: EPOLLERR - disconnecting now > [2019-01-21 08:23:25.346500] W [rpc-clnt.c:1753:rpc_clnt_submit] > 0-glusterfs: error returned while attempting to connect to host:(null), > port:0 > > > > Please let us know steps to recover bricks. > > > BR > Salam > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > _______________________________________________ > Bugs mailing list > *[email protected]* <[email protected]> > *https://lists.gluster.org/mailman/listinfo/bugs* > <https://lists.gluster.org/mailman/listinfo/bugs> > > > -- > Amar Tumballi (amarts) > _______________________________________________ > Gluster-users mailing list > *[email protected]* <[email protected]> > *https://lists.gluster.org/mailman/listinfo/gluster-users* > <https://lists.gluster.org/mailman/listinfo/gluster-users> > > > -- > Thanks, > Sanju > > > -- > Thanks, > Sanju > > > -- > Thanks, > Sanju > -- Thanks, Sanju
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
