Hello, we are using glusterfs 3.10.3. We currently have a gluster heal volume full running, the crawl is still running.
Starting time of crawl: Tue Nov 14 15:58:35 2017 Crawl is in progress Type of crawl: FULL No. of entries healed: 0 No. of entries in split-brain: 0 No. of heal failed entries: 0 getfattr from both files: # getfattr -d -m . -e hex /mnt/AIDATA/data//ishmaelb/experiments/omie/omieali/cifar10/donsker_grad_reg_ali_dcgan_stat_dcgan_ac_True/omieali_cifar10_zdim_100_enc_dcgan_dec_dcgan_stat_dcgan_posterior_propagated_enc_beta1.0_dec_beta_1.0_info_metric_donsker_varadhan_info_lam_0.334726025306_222219-23_10_17/data/data_gen_iter_86000.pkl getfattr: Removing leading '/' from absolute path names # file: mnt/AIDATA/data//ishmaelb/experiments/omie/omieali/cifar10/donsker_grad_reg_ali_dcgan_stat_dcgan_ac_True/omieali_cifar10_zdim_100_enc_dcgan_dec_dcgan_stat_dcgan_posterior_propagated_enc_beta1.0_dec_beta_1.0_info_metric_donsker_varadhan_info_lam_0.334726025306_222219-23_10_17/data/data_gen_iter_86000.pkl security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data01-client-0=0x000000000000000100000000 trusted.gfid=0x7e8513f4d4e24e66b0ba2dbe4c803c54 # getfattr -d -m . -e hex /mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore\ architecture\ capacity/Explore\ architecture\ capacity\(projection_size\=32\;mixing_depth\=0\;num_filters\=64\;filter_size\=3\;block_depth\=3\)/model.ckpt-70001.data-00000-of-00001.tempstate1629411508065733704 getfattr: Removing leading '/' from absolute path names # file: mnt/AIDATA/data/home/allac/experiments/171023_105655_mini_imagenet_projection_size_mixing_depth_num_filters_filter_size_block_depth_Explore architecture capacity/Explore architecture capacity(projection_size=32;mixing_depth=0;num_filters=64;filter_size=3;block_depth=3)/model.ckpt-70001.data-00000-of-00001.tempstate1629411508065733704 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.data01-client-0=0x000000000000000000000000 trusted.bit-rot.version=0x02000000000000005979d278000af1e7 trusted.gfid=0x9612ecd2106d42f295ebfef495c1d8ab # gluster volume heal data01 Launching heal operation to perform index self heal on volume data01 has been successful Use heal info commands to check status # cat /var/log/glusterfs/glustershd.log [2017-11-12 08:39:01.907287] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2017-11-15 08:18:02.084766] I [MSGID: 100011] [glusterfsd.c:1414:reincarnate] 0-glusterfsd: Fetching the volume file from server... [2017-11-15 08:18:02.085718] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing [2017-11-15 19:13:42.005307] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory] The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]" repeated 5 times between [2017-11-15 19:13:42.005307] and [2017-11-15 19:13:42.166579] [2017-11-15 19:23:43.041956] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory] The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]" repeated 5 times between [2017-11-15 19:23:43.041956] and [2017-11-15 19:23:43.235831] [2017-11-15 19:30:22.726808] W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory] The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] 0-data01-client-0: remote operation failed. Path: <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) [No such file or directory]" repeated 4 times between [2017-11-15 19:30:22.726808] and [2017-11-15 19:30:22.827631] [2017-11-16 15:04:34.102010] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-data01-replicate-0: performing metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab [2017-11-16 15:04:34.186781] I [MSGID: 108026] [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: Completed metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab. sources=[1] sinks=0 [2017-11-16 15:04:38.776070] I [MSGID: 108026] [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: Completed data selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54. sources=[1] sinks=0 [2017-11-16 15:04:38.811744] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-data01-replicate-0: performing metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54 [2017-11-16 15:04:38.867474] I [MSGID: 108026] [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: Completed metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54. sources=[1] sinks=0 On Thu, Nov 16, 2017 at 7:14 AM, Ravishankar N <[email protected]> wrote: > > > On 11/16/2017 04:12 PM, Nithya Balachandran wrote: > > > > On 15 November 2017 at 19:57, Frederic Harmignies <frederic.harmignies@ > elementai.com> wrote: > >> Hello, we have 2x files that are missing from one of the bricks. No idea >> how to fix this. >> >> Details: >> >> # gluster volume info >> >> Volume Name: data01 >> Type: Replicate >> Volume ID: 39b4479c-31f0-4696-9435-5454e4f8d310 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: 192.168.186.11:/mnt/AIDATA/data >> Brick2: 192.168.186.12:/mnt/AIDATA/data >> Options Reconfigured: >> performance.cache-refresh-timeout: 30 >> client.event-threads: 16 >> server.event-threads: 32 >> performance.readdir-ahead: off >> performance.io-thread-count: 32 >> performance.cache-size: 32GB >> transport.address-family: inet >> nfs.disable: on >> features.trash: off >> features.trash-max-filesize: 500MB >> >> # gluster volume heal data01 info >> Brick 192.168.186.11:/mnt/AIDATA/data >> Status: Connected >> Number of entries: 0 >> >> Brick 192.168.186.12:/mnt/AIDATA/data >> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> >> <gfid:9612ecd2-106d-42f2-95eb-fef495c1d8ab> >> Status: Connected >> Number of entries: 2 >> >> # gluster volume heal data01 info split-brain >> Brick 192.168.186.11:/mnt/AIDATA/data >> Status: Connected >> Number of entries in split-brain: 0 >> >> Brick 192.168.186.12:/mnt/AIDATA/data >> Status: Connected >> Number of entries in split-brain: 0 >> >> >> Both files is missing from the folder on Brick1, the gfid files are also >> missing in the .gluster folder on that same Brick1. >> Brick2 has both the files and the gfid file in .gluster >> >> We already tried: >> >> #gluster heal volume full >> Running a stat and ls -l on both files from a mounted client to try and >> trigger a heal >> >> Would a re-balance fix this? Any guidance would be greatly appreciated! >> > > A rebalance would not help here as this is a replicate volume. Ravi, any > idea what could be going wrong here? > > No, explicit lookup should have healed the file on the missing brick. > Unless lookup did not hit afr and is served from caching translators. > Frederic, what version of gluster are you running? Can you launch 'gluster > heal volume' and see glustershd logs for possible warnings? Use DEBUG > client-log-level if you have to. Also, instead of stat, try a getfattr on > the file from the mount. > > -Ravi > > > Regards, > Nithya > >> >> Thank you in advance! >> >> -- >> >> *Frederic Harmignies* >> *High Performance Computer Administrator* >> >> www.elementai.com >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > -- *Frederic Harmignies* *High Performance Computer Administrator* www.elementai.com
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
