On Wed, Nov 2, 2016 at 7:00 PM, Krutika Dhananjay <kdhan...@redhat.com> wrote:
> Just finished testing VM storage use-case. > > *Volume configuration used:* > > [root@srv-1 ~]# gluster volume info > > Volume Name: rep > Type: Replicate > Volume ID: 2c603783-c1da-49b7-8100-0238c777b731 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: srv-1:/bricks/rep1 > Brick2: srv-2:/bricks/rep2 > Brick3: srv-3:/bricks/rep4 > Options Reconfigured: > nfs.disable: on > performance.readdir-ahead: on > transport.address-family: inet > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > features.shard: on > cluster.granular-entry-heal: on > cluster.locking-scheme: granular > network.ping-timeout: 30 > server.allow-insecure: on > storage.owner-uid: 107 > storage.owner-gid: 107 > cluster.data-self-heal-algorithm: full > > Used FUSE to mount the volume locally on each of the 3 nodes (no external > clients). > shard-block-size - 4MB. > > *TESTS AND RESULTS:* > > *What works:* > > * Created 3 vm images, one per hypervisor. Installed fedora 24 on all of > them. > Used virt-manager for ease of setting up the environment. Installation > went fine. All green. > > * Rebooted the vms. Worked fine. > > * Killed brick-1. Ran dd on the three vms to create a 'src' file. Captured > their md5sum value. Verified that > the gfid indices and name indices are created under > .glusterfs/indices/xattrop and .glusterfs/indices/entry-changes > respectively as they should. Brought the brick back up. Waited until heal > completed. Captured md5sum again. They matched. > > * Killed brick-2. Copied 'src' file from the step above into new file > using dd. Captured md5sum on the newly created file. > Checksum matched. Waited for heal to finish. Captured md5sum again. > Everything matched. > > * Repeated the test above with brick-3 being killed and brought back up > after a while. Worked fine. > > At the end I also captured md5sums from the backend of the shards on the > three replicas. They all were found to be > in sync. So far so good. > > *What did NOT work:* > > * Started dd again on all 3 vms to copy the existing files to new files. > While dd was running, I ran replace-brick to replace the third brick with a > new brick on the same node with a different path. This caused dd on all > three vms to simultaneously fail with "Input/Output error". I tried to read > off the files, even that failed. Rebooted the vms. By this time, /.shard is > in > split-brain as per heal-info. And the vms seem to have suffered corruption > and are in an irrecoverable state. > > I checked the logs. The pattern is very much similar to the one in the > add-brick bug Lindsay reported here - https://bugzilla.redhat.com/ > show_bug.cgi?id=1387878. Seems like something is going wrong each time > there is a graph switch. > > @Aravinda and Pranith: > > I will need some time to debug this, if 3.9 release can wait until it is > RC'd and fixed. > Otherwise we will need to caution the users to not do replace-brick, > add-brick etc (or any form of graph switch for that matter) *might* cause > vm corruption, irrespective of whether the users are using FUSE or gfapi, > in 3.9.0. > > Let me know what your decision is. > Since this bug is not a regression let us document this as a known issue. Let us do our best to get the fix in next release. I am almost done with testing afr and ec. For afr, leaks etc were not there in the tests I did. But I am seeing performance drop for crawling related tests. This is with 3.9.0rc2 running directory_crawl_create ... done (252.91 secs) running directory_crawl ... done (104.83 secs) running directory_recrawl ... done (71.20 secs) running metadata_modify ... done (324.83 secs) running directory_crawl_delete ... done (124.22 secs) This is with 3.8.5 running directory_crawl_create ... done (176.48 secs) running directory_crawl ... done (9.99 secs) running directory_recrawl ... done (7.15 secs) running metadata_modify ... done (198.36 secs) running directory_crawl_delete ... done (89.32 secs) I am not seeing good performance with ec in 3.9.0rc2 when compared to 3.8.5 either. With v3.9.0rc2: running emptyfiles_create ... done (1278.63 secs) running emptyfiles_delete ... done (254.60 secs) running smallfiles_create ... done (1663.04 secs) With v3.8.5: emptyfiles_create 756.11 emptyfiles_delete 349.97 smallfiles_create 903.47 Functionality is fine in both, only the performance. Since these are regressions I will spend some time on these to find what could be the reason. > -Krutika > > > On Wed, Oct 26, 2016 at 8:04 PM, Aravinda <avish...@redhat.com> wrote: > >> Gluster 3.9.0rc2 tarball is available here >> http://bits.gluster.org/pub/gluster/glusterfs/src/glusterfs- >> 3.9.0rc2.tar.gz >> >> regards >> Aravinda >> >> >> On Tuesday 25 October 2016 04:12 PM, Aravinda wrote: >> >>> Hi, >>> >>> Since Automated test framework for Gluster is in progress, we need help >>> from Maintainers and developers to test the features and bug fixes to >>> release Gluster 3.9. >>> >>> In last maintainers meeting Shyam shared an idea about having a Test day >>> to accelerate the testing and release. >>> >>> Please participate in testing your component(s) on Oct 27, 2016. We will >>> prepare the rc2 build by tomorrow and share the details before Test day. >>> >>> RC1 Link: http://www.gluster.org/pipermail/maintainers/2016-September/ >>> 001442.html >>> Release Checklist: https://public.pad.fsfe.org/p/ >>> gluster-component-release-checklist >>> >>> >>> Thanks and Regards >>> Aravinda and Pranith >>> >>> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> > > -- Pranith
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel