While I'm still going through the logs, just wanted to point out a couple of things:
1. It is recommended that you use 3-way replication (replica count 3) for VM store use case 2. network.ping-timeout at 5 seconds is way too low. Please change it to 30. Is there any specific reason for using NFS-Ganesha over gfapi/FUSE? Will get back with anything else I might find or more questions if I have any. -Krutika On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <[email protected]> wrote: > Thanks mate, > > Kindly, check the attachment. > > > > -- > > Respectfully > *Mahdi A. Mahdi* > > ------------------------------ > *From:* Krutika Dhananjay <[email protected]> > *Sent:* Sunday, March 19, 2017 10:00:22 AM > > *To:* Mahdi Adnan > *Cc:* [email protected] > *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption > > In that case could you share the ganesha-gfapi logs? > > -Krutika > > On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <[email protected]> > wrote: > >> I have two volumes, one is mounted using libgfapi for ovirt mount, the >> other one is exported via NFS-Ganesha for VMWare which is the one im >> testing now. >> >> >> >> -- >> >> Respectfully >> *Mahdi A. Mahdi* >> >> ------------------------------ >> *From:* Krutika Dhananjay <[email protected]> >> *Sent:* Sunday, March 19, 2017 8:02:19 AM >> >> *To:* Mahdi Adnan >> *Cc:* [email protected] >> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >> >> >> >> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <[email protected]> >> wrote: >> >>> Kindly, check the attached new log file, i dont know if it's helpful or >>> not but, i couldn't find the log with the name you just described. >>> >> No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS? >> >> -Krutika >> >>> >>> >>> -- >>> >>> Respectfully >>> *Mahdi A. Mahdi* >>> >>> ------------------------------ >>> *From:* Krutika Dhananjay <[email protected]> >>> *Sent:* Saturday, March 18, 2017 6:10:40 PM >>> >>> *To:* Mahdi Adnan >>> *Cc:* [email protected] >>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>> >>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse >>> mount logs? It should be right under /var/log/glusterfs/ directory >>> named after the mount point name, only hyphenated. >>> >>> -Krutika >>> >>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <[email protected]> >>> wrote: >>> >>>> Hello Krutika, >>>> >>>> >>>> Kindly, check the attached logs. >>>> >>>> >>>> >>>> -- >>>> >>>> Respectfully >>>> *Mahdi A. Mahdi* >>>> >>>> ------------------------------ >>>> *From:* Krutika Dhananjay <[email protected]> >>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM >>>> *To:* Mahdi Adnan >>>> *Cc:* [email protected] >>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption >>>> >>>> Hi Mahdi, >>>> >>>> Could you attach mount, brick and rebalance logs? >>>> >>>> -Krutika >>>> >>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <[email protected]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick >>>>> procedure in a volume contains few VMs. >>>>> After the completion of rebalance, i have rebooted the VMs, some of >>>>> ran just fine, and others just crashed. >>>>> Windows boot to recovery mode and Linux throw xfs errors and does not >>>>> boot. >>>>> I ran the test again and it happened just as the first one, but i have >>>>> noticed only VMs doing disk IOs are affected by this bug. >>>>> The VMs in power off mode started fine and even md5 of the disk file >>>>> did not change after the rebalance. >>>>> >>>>> anyone else can confirm this ? >>>>> >>>>> >>>>> Volume info: >>>>> >>>>> Volume Name: vmware2 >>>>> Type: Distributed-Replicate >>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf >>>>> Status: Started >>>>> Snapshot Count: 0 >>>>> Number of Bricks: 22 x 2 = 44 >>>>> Transport-type: tcp >>>>> Bricks: >>>>> Brick1: gluster01:/mnt/disk1/vmware2 >>>>> Brick2: gluster03:/mnt/disk1/vmware2 >>>>> Brick3: gluster02:/mnt/disk1/vmware2 >>>>> Brick4: gluster04:/mnt/disk1/vmware2 >>>>> Brick5: gluster01:/mnt/disk2/vmware2 >>>>> Brick6: gluster03:/mnt/disk2/vmware2 >>>>> Brick7: gluster02:/mnt/disk2/vmware2 >>>>> Brick8: gluster04:/mnt/disk2/vmware2 >>>>> Brick9: gluster01:/mnt/disk3/vmware2 >>>>> Brick10: gluster03:/mnt/disk3/vmware2 >>>>> Brick11: gluster02:/mnt/disk3/vmware2 >>>>> Brick12: gluster04:/mnt/disk3/vmware2 >>>>> Brick13: gluster01:/mnt/disk4/vmware2 >>>>> Brick14: gluster03:/mnt/disk4/vmware2 >>>>> Brick15: gluster02:/mnt/disk4/vmware2 >>>>> Brick16: gluster04:/mnt/disk4/vmware2 >>>>> Brick17: gluster01:/mnt/disk5/vmware2 >>>>> Brick18: gluster03:/mnt/disk5/vmware2 >>>>> Brick19: gluster02:/mnt/disk5/vmware2 >>>>> Brick20: gluster04:/mnt/disk5/vmware2 >>>>> Brick21: gluster01:/mnt/disk6/vmware2 >>>>> Brick22: gluster03:/mnt/disk6/vmware2 >>>>> Brick23: gluster02:/mnt/disk6/vmware2 >>>>> Brick24: gluster04:/mnt/disk6/vmware2 >>>>> Brick25: gluster01:/mnt/disk7/vmware2 >>>>> Brick26: gluster03:/mnt/disk7/vmware2 >>>>> Brick27: gluster02:/mnt/disk7/vmware2 >>>>> Brick28: gluster04:/mnt/disk7/vmware2 >>>>> Brick29: gluster01:/mnt/disk8/vmware2 >>>>> Brick30: gluster03:/mnt/disk8/vmware2 >>>>> Brick31: gluster02:/mnt/disk8/vmware2 >>>>> Brick32: gluster04:/mnt/disk8/vmware2 >>>>> Brick33: gluster01:/mnt/disk9/vmware2 >>>>> Brick34: gluster03:/mnt/disk9/vmware2 >>>>> Brick35: gluster02:/mnt/disk9/vmware2 >>>>> Brick36: gluster04:/mnt/disk9/vmware2 >>>>> Brick37: gluster01:/mnt/disk10/vmware2 >>>>> Brick38: gluster03:/mnt/disk10/vmware2 >>>>> Brick39: gluster02:/mnt/disk10/vmware2 >>>>> Brick40: gluster04:/mnt/disk10/vmware2 >>>>> Brick41: gluster01:/mnt/disk11/vmware2 >>>>> Brick42: gluster03:/mnt/disk11/vmware2 >>>>> Brick43: gluster02:/mnt/disk11/vmware2 >>>>> Brick44: gluster04:/mnt/disk11/vmware2 >>>>> Options Reconfigured: >>>>> cluster.server-quorum-type: server >>>>> nfs.disable: on >>>>> performance.readdir-ahead: on >>>>> transport.address-family: inet >>>>> performance.quick-read: off >>>>> performance.read-ahead: off >>>>> performance.io-cache: off >>>>> performance.stat-prefetch: off >>>>> cluster.eager-lock: enable >>>>> network.remote-dio: enable >>>>> features.shard: on >>>>> cluster.data-self-heal-algorithm: full >>>>> features.cache-invalidation: on >>>>> ganesha.enable: on >>>>> features.shard-block-size: 256MB >>>>> client.event-threads: 2 >>>>> server.event-threads: 2 >>>>> cluster.favorite-child-policy: size >>>>> storage.build-pgfid: off >>>>> network.ping-timeout: 5 >>>>> cluster.enable-shared-storage: enable >>>>> nfs-ganesha: enable >>>>> cluster.server-quorum-ratio: 51% >>>>> >>>>> >>>>> Adding bricks: >>>>> gluster volume add-brick vmware2 replica 2 >>>>> gluster01:/mnt/disk11/vmware2 gluster03:/mnt/disk11/vmware2 >>>>> gluster02:/mnt/disk11/vmware2 gluster04:/mnt/disk11/vmware2 >>>>> >>>>> >>>>> starting fix layout: >>>>> gluster volume rebalance vmware2 fix-layout start >>>>> >>>>> Starting rebalance: >>>>> gluster volume rebalance vmware2 start >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Respectfully >>>>> *Mahdi A. Mahdi* >>>>> >>>>> >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> [email protected] >>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>> >>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-users
