*David Gossage* *Carousel Checks Inc. | System Administrator* *Office* 708.613.2284
On Thu, May 19, 2016 at 7:25 PM, Kevin Lemonnier <[email protected]> wrote: > The I/O errors are happening after, not during the heal. > As described, I just rebooted a node, waited for the heal to finish, > rebooted another, waited for the heal to finish then rebooted the third. > From that point, the VM just has a lot of I/O errors showing whenever I > use the disk a lot (importing big MySQL dumps). The VM "screen" on the > console > tab of proxmox just spams I/O errors from that point, which it didn't > before rebooting > the gluster nodes. Tried to poweroff the VM and force full heals, but I > didn't find > a way to fix the problem short of deleting the VM disk and restoring it > from a backup. > > I have 3 other servers on 3.7.6 where that problem isn't happening, so it > might be a 3.7.11 bug, > but since the raid card failed recently on one of the nodes I'm not really > sure some other > piece of hardware isn't at fault .. Unfortunatly I don't have the hardware > to test that. > The only way to be sure would be to upgrade the 3.7.6 nodes to 3.7.11 and > repeat the same tests, > but those nodes are in production and the VM freezes during the heal last > month already > caused huge problems for our clients, really can't afford any other > problems there, > so testing on them isn't an option. > > Are the 3.7.11 nodes in production? Could they be downgraded to 3.7.6 and see if problem still occurs? > To sum up, I have 3 nodes on 3.7.6 with no corruption happening but huge > freezes during heals, > and 3 other nodes on 3.7.11 with no freezes during heal but corruption. > qemu-img doesn't see the > corruption, it only shows on the VM's screen and seems mostly harmless, > but sometimes the VM > does switch to read-only mode saying it had too many I/O errors. > > Would the bitrot detection deamon detect a hardware problem ? I did enable > it but it didn't > detect anything, although I don't know how to force a check on it, no idea > if it ran a scrub > since the corruption happened. > > > On Thu, May 19, 2016 at 04:04:49PM -0400, Alastair Neil wrote: > > I am slightly confused you say you have image file corruption but > then you > > say the qemu-img check says there is no corruption.A If what you > mean is > > that you see I/O errors during a heal this is likely to be due to io > > starvation, something that is a well know issue. > > There is work happening to improve this in version 3.8: > > https://bugzilla.redhat.com/show_bug.cgi?id=1269461 > > On 19 May 2016 at 09:58, Kevin Lemonnier <[email protected]> > wrote: > > > > That's a different problem then, I have corruption without removing > or > > adding bricks, > > as mentionned. Might be two separate issue > > > > On Thu, May 19, 2016 at 11:25:34PM +1000, Lindsay Mathieson wrote: > > >A A On 19/05/2016 12:17 AM, Lindsay Mathieson wrote: > > > > > >A A A One thought - since the VM's are active while the brick is > > >A A A removed/re-added, could it be the shards that are written > > while the > > >A A A brick is added that are the reverse healing shards? > > > > > >A A I tested by: > > > > > >A A - removing brick 3 > > > > > >A A - erasing brick 3 > > > > > >A A - closing down all VM's > > > > > >A A - adding new brick 3 > > > > > >A A - waiting until heal number reached its max and started > > decreasing > > > > > >A A A There were no reverse heals > > > > > >A A - Started the VM's backup. No real issues there though one > showed > > IO > > >A A errors, presumably due to shards being locked as they were > > healed. > > > > > >A A - VM's started ok, no reverse heals were noted and eventually > > Brick 3 was > > >A A fully healed. The VM's do not appear to be corrupted. > > > > > >A A So it would appear the problem is adding a brick while the > volume > > is being > > >A A written to. > > > > > >A A Cheers, > > > > > >A -- > > >A Lindsay Mathieson > > > > > _______________________________________________ > > > Gluster-users mailing list > > > [email protected] > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > -- > > Kevin Lemonnier > > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > _______________________________________________ > > Gluster-users mailing list > > [email protected] > > http://www.gluster.org/mailman/listinfo/gluster-users > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://www.gluster.org/mailman/listinfo/gluster-users >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
