Hi, I will try to recreate this issue tomorrow on my machines with the steps that Lindsay provided in this thread. I will let you know the result soon after that.
-Krutika On Wednesday, May 18, 2016, Kevin Lemonnier <[email protected]> wrote: > Hi, > > Some news on this. > Over the week end the RAID Card of the node ipvr2 died, and I thought > that maybe that was the problem all along. The RAID Card was changed > and yesterday I reinstalled everything. > Same problem just now. > > My test is simple, using the website hosted on the VMs all the time > I reboot ipvr50, wait for the heal to complete, migrate all the VMs off > ipvr2 then reboot it, wait for the heal to complete then migrate all > the VMs off ipvr3 then reboot it. > Everytime the first database VM (which is the only one really using the disk > durign the heal) starts showing I/O errors on it's disk. > > Am I really the only one with that problem ? > Maybe one of the drives is dying too, who knows, but SMART isn't saying anything .. > > > On Thu, May 12, 2016 at 04:03:02PM +0200, Kevin Lemonnier wrote: >> Hi, >> >> I had a problem some time ago with 3.7.6 and freezing during heals, >> and multiple persons advised to use 3.7.11 instead. Indeed, with that >> version the freez problem is fixed, it works like a dream ! You can >> almost not tell that a node is down or healing, everything keeps working >> except for a little freez when the node just went down and I assume >> hasn't timed out yet, but that's fine. >> >> Now I have a 3.7.11 volume on 3 nodes for testing, and the VM are proxmox >> VMs with qCow2 disks stored on the gluster volume. >> Here is the config : >> >> Volume Name: gluster >> Type: Replicate >> Volume ID: e4f01509-beaf-447d-821f-957cc5c20c0a >> Status: Started >> Number of Bricks: 1 x 3 = 3 >> Transport-type: tcp >> Bricks: >> Brick1: ipvr2.client:/mnt/storage/gluster >> Brick2: ipvr3.client:/mnt/storage/gluster >> Brick3: ipvr50.client:/mnt/storage/gluster >> Options Reconfigured: >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> network.remote-dio: enable >> cluster.eager-lock: enable >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> features.shard: on >> features.shard-block-size: 64MB >> cluster.data-self-heal-algorithm: full >> performance.readdir-ahead: on >> >> >> As mentioned, I rebooted one of the nodes to test the freezing issue I had >> on previous versions and appart from the initial timeout, nothing, the website >> hosted on the VMs keeps working like a charm even during heal. >> Since it's testing, there isn't any load on it though, and I just tried to refresh >> the database by importing the production one on the two MySQL VMs, and both of them >> started doing I/O errors. I tried shutting them down and powering them on again, >> but same thing, even starting full heals by hand doesn't solve the problem, the disks are >> corrupted. They still work, but sometimes they remount their partitions read only .. >> >> I believe there is a few people already using 3.7.11, no one noticed corruption problems ? >> Anyone using Proxmox ? As already mentionned in multiple other threads on this mailing list >> by other users, I also have pretty much always shards in heal info, but nothing "stuck" there, >> they always go away in a few seconds getting replaced by other shards. >> >> Thanks >> >> -- >> Kevin Lemonnier >> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 > > > >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://www.gluster.org/mailman/listinfo/gluster-users > > > -- > Kevin Lemonnier > PGP Fingerprint : 89A5 2283 04A0 E6E9 0111 >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
