Hello, looks like the full heal fixed the problem, i was just impatient :) [2017-11-16 15:04:34.102010] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-data01-replicate-0: performing metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab [2017-11-16 15:04:34.186781] I [MSGID: 108026] [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: Completed metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab. sources=[1] sinks=0 [2017-11-16 15:04:38.776070] I [MSGID: 108026] [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: Completed data selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54. sources=[1] sinks=0 [2017-11-16 15:04:38.811744] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-data01-replicate-0: performing metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54 [2017-11-16 15:04:38.867474] I [MSGID: 108026] [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: Completed metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54. sources=[1] sinks=0
# gluster volume heal data01 info Brick 192.168.186.11:/mnt/AIDATA/data Status: Connected Number of entries: 0 Brick 192.168.186.12:/mnt/AIDATA/data Status: Connected Number of entries: 0 Thank you for your fast response! On Thu, Nov 16, 2017 at 10:13 AM, Frederic Harmignies < [email protected]> wrote: > Hello, we are using glusterfs 3.10.3. > > We currently have a gluster heal volume full running, the crawl is still > running. > > Starting time of crawl: Tue Nov 14 15:58:35 2017 > > Crawl is in progress > Type of crawl: FULL > No. of entries healed: 0 > No. of entries in split-brain: 0 > No. of heal failed entries: 0 > > getfattr from both files: > > # getfattr -d -m . -e hex /mnt/AIDATA/data//ishmaelb/ > experiments/omie/omieali/cifar10/donsker_grad_reg_ali_ > dcgan_stat_dcgan_ac_True/omieali_cifar10_zdim_100_enc_ > dcgan_dec_dcgan_stat_dcgan_posterior_propagated_enc_ > beta1.0_dec_beta_1.0_info_metric_donsker_varadhan_info_ > lam_0.334726025306_222219-23_10_17/data/data_gen_iter_86000.pkl > getfattr: Removing leading '/' from absolute path names > # file: mnt/AIDATA/data//ishmaelb/experiments/omie/omieali/ > cifar10/donsker_grad_reg_ali_dcgan_stat_dcgan_ac_True/ > omieali_cifar10_zdim_100_enc_dcgan_dec_dcgan_stat_dcgan_ > posterior_propagated_enc_beta1.0_dec_beta_1.0_info_ > metric_donsker_varadhan_info_lam_0.334726025306_222219-23_ > 10_17/data/data_gen_iter_86000.pkl > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > trusted.afr.data01-client-0=0x000000000000000100000000 > trusted.gfid=0x7e8513f4d4e24e66b0ba2dbe4c803c54 > > # getfattr -d -m . -e hex /mnt/AIDATA/data/home/allac/ > experiments/171023_105655_mini_imagenet_projection_size_ > mixing_depth_num_filters_filter_size_block_depth_Explore\ architecture\ > capacity/Explore\ architecture\ capacity\(projection_size\=32\ > ;mixing_depth\=0\;num_filters\=64\;filter_size\=3\;block_ > depth\=3\)/model.ckpt-70001.data-00000-of-00001. > tempstate1629411508065733704 > getfattr: Removing leading '/' from absolute path names > # file: mnt/AIDATA/data/home/allac/experiments/171023_105655_ > mini_imagenet_projection_size_mixing_depth_num_filters_ > filter_size_block_depth_Explore architecture capacity/Explore > architecture capacity(projection_size=32;mixing_depth=0;num_filters=64; > filter_size=3;block_depth=3)/model.ckpt-70001.data-00000-of-00001. > tempstate1629411508065733704 > security.selinux=0x73797374656d5f753a6f626a6563 > 745f723a756e6c6162656c65645f743a733000 > trusted.afr.data01-client-0=0x000000000000000000000000 > trusted.bit-rot.version=0x02000000000000005979d278000af1e7 > trusted.gfid=0x9612ecd2106d42f295ebfef495c1d8ab > > > # gluster volume heal data01 > Launching heal operation to perform index self heal on volume data01 has > been successful > Use heal info commands to check status > # cat /var/log/glusterfs/glustershd.log > [2017-11-12 08:39:01.907287] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2017-11-15 08:18:02.084766] I [MSGID: 100011] [glusterfsd.c:1414:reincarnate] > 0-glusterfsd: Fetching the volume file from server... > [2017-11-15 08:18:02.085718] I [glusterfsd-mgmt.c:1789:mgmt_getspec_cbk] > 0-glusterfs: No change in volfile, continuing > [2017-11-15 19:13:42.005307] W [MSGID: 114031] > [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 0-data01-client-0: remote operation failed. Path: > <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> > (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) > [No such file or directory] > The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 0-data01-client-0: remote operation failed. Path: > <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> > (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) > [No such file or directory]" repeated 5 times between [2017-11-15 > 19:13:42.005307] and [2017-11-15 19:13:42.166579] > [2017-11-15 19:23:43.041956] W [MSGID: 114031] > [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 0-data01-client-0: remote operation failed. Path: > <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> > (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) > [No such file or directory] > The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 0-data01-client-0: remote operation failed. Path: > <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> > (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) > [No such file or directory]" repeated 5 times between [2017-11-15 > 19:23:43.041956] and [2017-11-15 19:23:43.235831] > [2017-11-15 19:30:22.726808] W [MSGID: 114031] > [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 0-data01-client-0: remote operation failed. Path: > <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> > (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) > [No such file or directory] > The message "W [MSGID: 114031] [client-rpc-fops.c:2928:client3_3_lookup_cbk] > 0-data01-client-0: remote operation failed. Path: > <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> > (7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54) > [No such file or directory]" repeated 4 times between [2017-11-15 > 19:30:22.726808] and [2017-11-15 19:30:22.827631] > [2017-11-16 15:04:34.102010] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-data01-replicate-0: performing metadata selfheal on > 9612ecd2-106d-42f2-95eb-fef495c1d8ab > [2017-11-16 15:04:34.186781] I [MSGID: 108026] > [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: > Completed metadata selfheal on 9612ecd2-106d-42f2-95eb-fef495c1d8ab. > sources=[1] sinks=0 > [2017-11-16 15:04:38.776070] I [MSGID: 108026] > [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: > Completed data selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54. > sources=[1] sinks=0 > [2017-11-16 15:04:38.811744] I [MSGID: 108026] > [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] > 0-data01-replicate-0: performing metadata selfheal on > 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54 > [2017-11-16 15:04:38.867474] I [MSGID: 108026] > [afr-self-heal-common.c:1255:afr_log_selfheal] 0-data01-replicate-0: > Completed metadata selfheal on 7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54. > sources=[1] sinks=0 > > > > > On Thu, Nov 16, 2017 at 7:14 AM, Ravishankar N <[email protected]> > wrote: > >> >> >> On 11/16/2017 04:12 PM, Nithya Balachandran wrote: >> >> >> >> On 15 November 2017 at 19:57, Frederic Harmignies < >> [email protected]> wrote: >> >>> Hello, we have 2x files that are missing from one of the bricks. No idea >>> how to fix this. >>> >>> Details: >>> >>> # gluster volume info >>> >>> Volume Name: data01 >>> Type: Replicate >>> Volume ID: 39b4479c-31f0-4696-9435-5454e4f8d310 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 1 x 2 = 2 >>> Transport-type: tcp >>> Bricks: >>> Brick1: 192.168.186.11:/mnt/AIDATA/data >>> Brick2: 192.168.186.12:/mnt/AIDATA/data >>> Options Reconfigured: >>> performance.cache-refresh-timeout: 30 >>> client.event-threads: 16 >>> server.event-threads: 32 >>> performance.readdir-ahead: off >>> performance.io-thread-count: 32 >>> performance.cache-size: 32GB >>> transport.address-family: inet >>> nfs.disable: on >>> features.trash: off >>> features.trash-max-filesize: 500MB >>> >>> # gluster volume heal data01 info >>> Brick 192.168.186.11:/mnt/AIDATA/data >>> Status: Connected >>> Number of entries: 0 >>> >>> Brick 192.168.186.12:/mnt/AIDATA/data >>> <gfid:7e8513f4-d4e2-4e66-b0ba-2dbe4c803c54> >>> <gfid:9612ecd2-106d-42f2-95eb-fef495c1d8ab> >>> Status: Connected >>> Number of entries: 2 >>> >>> # gluster volume heal data01 info split-brain >>> Brick 192.168.186.11:/mnt/AIDATA/data >>> Status: Connected >>> Number of entries in split-brain: 0 >>> >>> Brick 192.168.186.12:/mnt/AIDATA/data >>> Status: Connected >>> Number of entries in split-brain: 0 >>> >>> >>> Both files is missing from the folder on Brick1, the gfid files are also >>> missing in the .gluster folder on that same Brick1. >>> Brick2 has both the files and the gfid file in .gluster >>> >>> We already tried: >>> >>> #gluster heal volume full >>> Running a stat and ls -l on both files from a mounted client to try and >>> trigger a heal >>> >>> Would a re-balance fix this? Any guidance would be greatly appreciated! >>> >> >> A rebalance would not help here as this is a replicate volume. Ravi, any >> idea what could be going wrong here? >> >> No, explicit lookup should have healed the file on the missing brick. >> Unless lookup did not hit afr and is served from caching translators. >> Frederic, what version of gluster are you running? Can you launch >> 'gluster heal volume' and see glustershd logs for possible warnings? Use >> DEBUG client-log-level if you have to. Also, instead of stat, try a >> getfattr on the file from the mount. >> >> -Ravi >> >> >> Regards, >> Nithya >> >>> >>> Thank you in advance! >>> >>> -- >>> >>> *Frederic Harmignies* >>> *High Performance Computer Administrator* >>> >>> www.elementai.com >>> >>> _______________________________________________ >>> Gluster-users mailing list >>> [email protected] >>> http://lists.gluster.org/mailman/listinfo/gluster-users >>> >> >> >> > > > -- > > *Frederic Harmignies* > *High Performance Computer Administrator* > > www.elementai.com > -- *Frederic Harmignies* *High Performance Computer Administrator* www.elementai.com
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
