I noticed that my new brick (replacement disk) did not have a .shard directory created on the brick, if that helps.
I removed the affected brick from the volume and then wiped the disk, did an add-brick, and everything healed right up. I didn’t try and set any attrs or anything else, just removed and added the brick as new. > On Aug 29, 2016, at 9:49 AM, Darrell Budic <bu...@onholyground.com> wrote: > > Just to let you know I’m seeing the same issue under 3.7.14 on CentOS 7. Some > content was healed correctly, now all the shards are queued up in a heal > list, but nothing is healing. Got similar brick errors logged to the ones > David was getting on the brick that isn’t healing: > > [2016-08-29 03:31:40.436110] E [MSGID: 115050] > [server-rpc-fops.c:179:server_lookup_cbk] 0-gv0-rep-server: 1613822: LOOKUP > (null) > (00000000-0000-0000-0000-000000000000/0f61bf63-8ef1-4e53-8bc3-6d46590c4fb1.29) > ==> (Invalid argument) [Invalid argument] > [2016-08-29 03:31:43.005013] E [MSGID: 115050] > [server-rpc-fops.c:179:server_lookup_cbk] 0-gv0-rep-server: 1616802: LOOKUP > (null) > (00000000-0000-0000-0000-000000000000/0f61bf63-8ef1-4e53-8bc3-6d46590c4fb1.40) > ==> (Invalid argument) [Invalid argument] > > This was after replacing the drive the brick was on and trying to get it back > into the system by setting the volume's fattr on the brick dir. I’ll try the > suggested method here on it it shortly. > > -Darrell > > >> On Aug 29, 2016, at 7:25 AM, Krutika Dhananjay <kdhan...@redhat.com >> <mailto:kdhan...@redhat.com>> wrote: >> >> Got it. Thanks. >> >> I tried the same test and shd crashed with SIGABRT (well, that's because I >> compiled from src with -DDEBUG). >> In any case, this error would prevent full heal from proceeding further. >> I'm debugging the crash now. Will let you know when I have the RC. >> >> -Krutika >> >> On Mon, Aug 29, 2016 at 5:47 PM, David Gossage <dgoss...@carouselchecks.com >> <mailto:dgoss...@carouselchecks.com>> wrote: >> >> On Mon, Aug 29, 2016 at 7:14 AM, David Gossage <dgoss...@carouselchecks.com >> <mailto:dgoss...@carouselchecks.com>> wrote: >> On Mon, Aug 29, 2016 at 5:25 AM, Krutika Dhananjay <kdhan...@redhat.com >> <mailto:kdhan...@redhat.com>> wrote: >> Could you attach both client and brick logs? Meanwhile I will try these >> steps out on my machines and see if it is easily recreatable. >> >> >> Hoping 7z files are accepted by mail server. >> >> looks like zip file awaiting approval due to size >> >> -Krutika >> >> On Mon, Aug 29, 2016 at 2:31 PM, David Gossage <dgoss...@carouselchecks.com >> <mailto:dgoss...@carouselchecks.com>> wrote: >> Centos 7 Gluster 3.8.3 >> >> Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >> Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >> Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >> Options Reconfigured: >> cluster.data-self-heal-algorithm: full >> cluster.self-heal-daemon: on >> cluster.locking-scheme: granular >> features.shard-block-size: 64MB >> features.shard: on >> performance.readdir-ahead: on >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> performance.quick-read: off >> performance.read-ahead: off >> performance.io <http://performance.io/>-cache: off >> performance.stat-prefetch: on >> cluster.eager-lock: enable >> network.remote-dio: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> server.allow-insecure: on >> cluster.self-heal-window-size: 1024 >> cluster.background-self-heal-count: 16 >> performance.strict-write-ordering: off >> nfs.disable: on >> nfs.addr-namelookup: off >> nfs.enable-ino32: off >> cluster.granular-entry-heal: on >> >> Friday did rolling upgrade from 3.8.3->3.8.3 no issues. >> Following steps detailed in previous recommendations began proces of >> replacing and healngbricks one node at a time. >> >> 1) kill pid of brick >> 2) reconfigure brick from raid6 to raid10 >> 3) recreate directory of brick >> 4) gluster volume start <> force >> 5) gluster volume heal <> full >> >> 1st node worked as expected took 12 hours to heal 1TB data. Load was little >> heavy but nothing shocking. >> >> About an hour after node 1 finished I began same process on node2. Heal >> proces kicked in as before and the files in directories visible from mount >> and .glusterfs healed in short time. Then it began crawl of .shard adding >> those files to heal count at which point the entire proces ground to a halt >> basically. After 48 hours out of 19k shards it has added 5900 to heal list. >> Load on all 3 machnes is negligible. It was suggested to change this >> value to full cluster.data-self-heal-algorithm and restart volume which I >> did. No efffect. Tried relaunching heal no effect, despite any node >> picked. I started each VM and performed a stat of all files from within it, >> or a full virus scan and that seemed to cause short small spikes in shards >> added, but not by much. Logs are showing no real messages indicating >> anything is going on. I get hits to brick log on occasion of null lookups >> making me think its not really crawling shards directory but waiting for a >> shard lookup to add it. I'll get following in brick log but not constant >> and sometime multiple for same shard. >> >> [2016-08-29 08:31:57.478125] W [MSGID: 115009] >> [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no resolution type >> for (null) (LOOKUP) >> [2016-08-29 08:31:57.478170] E [MSGID: 115050] >> [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: 12591783: >> LOOKUP (null) (00000000-0000-0000-00 >> 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) ==> (Invalid >> argument) [Invalid argument] >> >> This one repeated about 30 times in row then nothing for 10 minutes then one >> hit for one different shard by itself. >> >> How can I determine if Heal is actually running? How can I kill it or force >> restart? Does node I start it from determine which directory gets crawled >> to determine heals? >> >> David Gossage >> Carousel Checks Inc. | System Administrator >> Office 708.613.2284 >> <> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users >> <http://www.gluster.org/mailman/listinfo/gluster-users> >> >> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> >> http://www.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users