On Sat, Aug 27, 2016 at 9:58 AM, David Gossage <[email protected]> wrote:
> On Fri, Aug 26, 2016 at 8:40 PM, David Gossage < > [email protected]> wrote: > >> I was in process of redoing underlying disk layout for a brick. >> triggered full heal. then realized I had skipped a step of applying zfs >> set xattr=sa which is kind of important running zfs under linux. >> >> Rather than wait however many hours until my TB of data heals is their a >> command in 3.8 to cancel a heal begun by gluster volume heal GLUSTER1 >> full? If not won't be end of world just waste of time to wait and then >> have to redo after writing out a TB of data. >> >> > Does the heal process crawl from any particular node when invoked? I have > 3 nodes. I ran command from node 3, node 2 is one with files needing > healed, node 1 is brick I heaeld yesterday but forgot to set xattr=sa on > which usually has bad performance results for zfsonlinux. I did set it > about 30 minutes into the heal figuring better some than none until I could > redo it again. > > 12 hours later the 1TB of data was healed so I figured I'd move on to node > 2, then 3. Then assuming 12 hour windows for each node I could redo node 1 > with correct settings before Monday. When node 1 healed it first found all > the visible files from mount point and .glusterfs, hen numbers jumped back > up after those were done and it started finding shards. It happened fairly > quickly. 2nd time around with node 2 it is crawling to a standstill while > finding all the shards to heal. I'm wondering if its doing the crawl from > node 1 and the poor settings that existed for first 30 minutes of file > heals is slowing it down. If so I would hope once the files that were > created/healed while settings weren't correct are found and it moves past > them the rest should go faster. > > The only errors in any logs are brick logs > > [2016-08-27 14:25:10.022786] E [MSGID: 115050] > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: 3251237: > LOOKUP (null) > (00000000-0000-0000-0000-000000000000/4c7d44fc-a0c1-413b-8dc4-2abbbe1d4d4f.423) > ==> (Invalid argument) [Invalid argument] > [2016-08-27 14:36:59.234073] W [MSGID: 115009] > [server-resolve.c:569:server_resolve] 0-GLUSTER1-server: no resolution > type for (null) (LOOKUP) > [2016-08-27 14:36:59.234128] E [MSGID: 115050] > [server-rpc-fops.c:156:server_lookup_cbk] 0-GLUSTER1-server: 3288322: > LOOKUP (null) > (00000000-0000-0000-0000-000000000000/4c7d44fc-a0c1-413b-8dc4-2abbbe1d4d4f.328) > ==> (Invalid argument) [Invalid argument] > > And I would hope that it's just related to heal process or when a shard is > hit and its found it doesnt exist here it errors out as expected. > > > 7 hours after starting full heal shards still haven't started healing, and count from heal statistics heal-count has only reached 1800 out of 19000 shards. shards dir hasn't even been recreated yet. Creation of the non sharded stubs (do they have a more official term?) in the visible mount point was as speedy as expected. shards are painfully slow. >> *David Gossage* >> *Carousel Checks Inc. | System Administrator* >> *Office* 708.613.2284 >> > >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
