# gluster --version glusterfs 3.7.9 built on Jun 10 2016 06:32:42
Try not to make fun of my python, but I was able to make a small modification to the to the sync_files.py script from smallfile and at least enable my team to move on with testing. It's terribly hacky and ugly, but works around the problem, which I am pretty convinced is a Gluster bug at this point. # diff bin/sync_files.py.orig bin/sync_files.py 6a7,8 > import errno > import binascii 27c29,40 < shutil.rmtree(master_invoke.network_dir) --- > try: > shutil.rmtree(master_invoke.network_dir) > except OSError as e: > err = e.errno > if err != errno.EEXIST: > # workaround for possible bug in Gluster > if err != errno.ENOTEMPTY: > raise e > else: > print('saw ENOTEMPTY on stonewall, moving shared directory') > ext = str(binascii.b2a_hex(os.urandom(15))) > shutil.move(master_invoke.network_dir, master_invoke.network_dir + ext) Dustin Black, RHCA Senior Architect, Software-Defined Storage Red Hat, Inc. (o) +1.212.510.4138 (m) +1.215.821.7423 dus...@redhat.com On Tue, Oct 18, 2016 at 7:09 PM, Dustin Black <dbl...@redhat.com> wrote: > Dang. I always think I get all the detail and inevitably leave out > something important. :-/ > > I'm mobile and don't have the exact version in front of me, but this is > recent if not latest RHGS on RHEL 7.2. > > > On Oct 18, 2016 7:04 PM, "Dan Lambright" <dlamb...@redhat.com> wrote: > >> Dustin, >> >> What level code ? I often run smallfile on upstream code with tiered >> volumes and have not seen this. >> >> Sure, one of us will get back to you. >> >> Unfortunately, gluster has a lot of protocol overhead (LOOKUPs), and they >> overwhelm the boost in transfer speeds you get for small files. A >> presentation at the Berlin gluster summit evaluated this. The expectation >> is md-cache will go a long way towards helping that, before too long. >> >> Dan >> >> >> >> ----- Original Message ----- >> > From: "Dustin Black" <dbl...@redhat.com> >> > To: gluster-devel@gluster.org >> > Cc: "Annette Clewett" <aclew...@redhat.com> >> > Sent: Tuesday, October 18, 2016 4:30:04 PM >> > Subject: [Gluster-devel] Possible race condition bug with tiered volume >> > >> > I have a 3x2 hot tier on NVMe drives with a 3x2 cold tier on RAID6 >> drives. >> > >> > # gluster vol info 1nvme-distrep3x2 >> > Volume Name: 1nvme-distrep3x2 >> > Type: Tier >> > Volume ID: 21e3fc14-c35c-40c5-8e46-c258c1302607 >> > Status: Started >> > Number of Bricks: 12 >> > Transport-type: tcp >> > Hot Tier : >> > Hot Tier Type : Distributed-Replicate >> > Number of Bricks: 3 x 2 = 6 >> > Brick1: n5:/rhgs/hotbricks/1nvme-distrep3x2-hot >> > Brick2: n4:/rhgs/hotbricks/1nvme-distrep3x2-hot >> > Brick3: n3:/rhgs/hotbricks/1nvme-distrep3x2-hot >> > Brick4: n2:/rhgs/hotbricks/1nvme-distrep3x2-hot >> > Brick5: n1:/rhgs/hotbricks/1nvme-distrep3x2-hot >> > Brick6: n0:/rhgs/hotbricks/1nvme-distrep3x2-hot >> > Cold Tier: >> > Cold Tier Type : Distributed-Replicate >> > Number of Bricks: 3 x 2 = 6 >> > Brick7: n0:/rhgs/coldbricks/1nvme-distrep3x2 >> > Brick8: n1:/rhgs/coldbricks/1nvme-distrep3x2 >> > Brick9: n2:/rhgs/coldbricks/1nvme-distrep3x2 >> > Brick10: n3:/rhgs/coldbricks/1nvme-distrep3x2 >> > Brick11: n4:/rhgs/coldbricks/1nvme-distrep3x2 >> > Brick12: n5:/rhgs/coldbricks/1nvme-distrep3x2 >> > Options Reconfigured: >> > cluster.tier-mode: cache >> > features.ctr-enabled: on >> > performance.readdir-ahead: on >> > >> > >> > I am attempting to run the 'smallfile' benchmark tool on this volume. >> The >> > 'smallfile' tool creates a starting gate directory and files in a shared >> > filesystem location. The first run (write) works as expected. >> > >> > # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top >> > /rhgs/client/1nvme-distrep3x2 --host-set >> > c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y >> > --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation >> create >> > >> > For the second run (read), I believe that smallfile attempts first to >> 'rm >> > -rf' the "network-sync-dir" path, which fails with ENOTEMPTY, causing >> the >> > run to fail >> > >> > # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top >> > /rhgs/client/1nvme-distrep3x2 --host-set >> > c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y >> > --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation >> create >> > ... >> > Traceback (most recent call last): >> > File "/root/bin/smallfile_cli.py", line 280, in <module> >> > run_workload() >> > File "/root/bin/smallfile_cli.py", line 270, in run_workload >> > return run_multi_host_workload(params) >> > File "/root/bin/smallfile_cli.py", line 62, in run_multi_host_workload >> > sync_files.create_top_dirs(master_invoke, True) >> > File "/root/bin/sync_files.py", line 27, in create_top_dirs >> > shutil.rmtree(master_invoke.network_dir) >> > File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree >> > onerror(os.rmdir, path, sys.exc_info()) >> > File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree >> > os.rmdir(path) >> > OSError: [Errno 39] Directory not empty: '/rhgs/client/1nvme-distrep3x2 >> /smf1' >> > >> > >> > From the client perspective, the directory is clearly empty. >> > >> > # ls -a /rhgs/client/1nvme-distrep3x2/smf1/ >> > . .. >> > >> > >> > And a quick search on the bricks shows that the hot tier on the last >> replica >> > pair is the offender. >> > >> > # for i in {0..5}; do ssh n$i "hostname; ls >> > /rhgs/coldbricks/1nvme-distrep3x2/smf1 | wc -l; ls >> > /rhgs/hotbricks/1nvme-distrep3x2-hot/smf1 | wc -l"; donerhosd0 >> > 0 >> > 0 >> > rhosd1 >> > 0 >> > 0 >> > rhosd2 >> > 0 >> > 0 >> > rhosd3 >> > 0 >> > 0 >> > rhosd4 >> > 0 >> > 1 >> > rhosd5 >> > 0 >> > 1 >> > >> > >> > (For the record, multiple runs of this reproducer show that it is >> > consistently the hot tier that is to blame, but it is not always the >> same >> > replica pair.) >> > >> > >> > Can someone try recreating this scenario to see if the problem is >> consistent? >> > Please reach out if you need me to provide any further details. >> > >> > >> > Dustin Black, RHCA >> > Senior Architect, Software-Defined Storage >> > Red Hat, Inc. >> > (o) +1.212.510.4138 (m) +1.215.821.7423 >> > dus...@redhat.com >> > >> > _______________________________________________ >> > Gluster-devel mailing list >> > Gluster-devel@gluster.org >> > http://www.gluster.org/mailman/listinfo/gluster-devel >> >
_______________________________________________ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel