I just reposted this patch addressing all the comments except two. I added those comments from older version of patchset to this one. Could some one who knows the following two files please help me resolve them: http://review.gluster.org/#/c/6475/5/xlators/features/qemu-block/src/bdrv-xlator.c http://review.gluster.org/#/c/6475/5/xlators/features/qemu-block/src/qb-coroutines.c
Pranith ----- Original Message ----- > From: "Pranith Kumar Karampuri" <pkara...@redhat.com> > To: "Anand Avati" <av...@gluster.org>, "Brian Foster" <bfos...@redhat.com> > Cc: gluster-devel@nongnu.org > Sent: Thursday, January 2, 2014 9:16:19 PM > Subject: Re: [Gluster-devel] important change to syncop infra > > Avati, Brian, > Could you guys please do a round of review for the following files and > provide your valuable inputs. > http://review.gluster.com/#/c/6475/4/xlators/features/qemu-block/src/bdrv-xlator.c > http://review.gluster.com/#/c/6475/4/xlators/features/qemu-block/src/qb-coroutines.c > > Pranith. > > ----- Original Message ----- > > From: "Pranith Kumar Karampuri" <pkara...@redhat.com> > > To: "Anand Avati" <av...@gluster.org> > > Cc: gluster-devel@nongnu.org > > Sent: Friday, December 13, 2013 9:09:03 AM > > Subject: Re: [Gluster-devel] important change to syncop infra > > > > hi, > > If there are no more objections, I will send the final patch on > > Monday. > > > > Pranith. > > ----- Original Message ----- > > > From: "Anand Avati" <av...@gluster.org> > > > To: "Anand Subramanian" <ana...@redhat.com> > > > Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, > > > gluster-devel@nongnu.org > > > Sent: Thursday, December 12, 2013 12:09:34 PM > > > Subject: Re: [Gluster-devel] important change to syncop infra > > > > > > The decision of doing away with -O2 is beyond our control, and we > > > shouldn't > > > have code which depend on optimization to be disabled to behave properly. > > > Representing -errno as return is the cleanest fix (that's how other > > > projects which use setcontext/getcontext are behaving too.) If there are > > > any new further issues which arise from setcontext/getcontext, I'm > > > tempted > > > to change the internal implementation to use a a vanilla pthread pool. > > > > > > Avati > > > > > > > > > On Wed, Dec 11, 2013 at 10:29 PM, Anand Subramanian > > > <ansub...@redhat.com>wrote: > > > > > > > Is doing away with -O2 an option that was ever considered or is it that > > > > we > > > > simply must have O2 on? > > > > > > > > (I understand that turning off O2 can open some so-far-unexposed can of > > > > worms and a lot of soaking maybe required, and also that we may have > > > > had > > > > a > > > > good set of perf related reasons to have settled on -O2 in the first > > > > place, > > > > but wanted to understand nevertheless...) > > > > > > > > Anand > > > > > > > > > > > > > > > > On 12/11/2013 02:21 PM, Pranith Kumar Karampuri wrote: > > > > > > > >> hi, > > > >> We found a day-1 bug when syncop_xxx() infra is used inside a > > > >> synctask with compilation optimization (CFLAGS -O2). This bug has been > > > >> dormant for at least 2 years. > > > >> There are around ~400(rebalance, replace-brick, bd, self-heal-daemon, > > > >> quota, fuse lock/fd migration) places where syncop is used in the code > > > >> base > > > >> all of which are potential candidates which can take the hit. > > > >> > > > >> I sent first round of patch at http://review.gluster.com/6475 to catch > > > >> regressions upstream. > > > >> These are the files that are affected by the changes I introduced to > > > >> fix > > > >> this: > > > >> > > > >> api/src/glfs-fops.c | 36 > > > >> ++++++++++++++++++++++++++++++++++ > > > >> api/src/glfs-handleops.c | 15 ++++++++++++++ > > > >> api/src/glfs-internal.h | 7 +++++++ > > > >> api/src/glfs-resolve.c | 10 ++++++++++ > > > >> libglusterfs/src/syncop.c | 117 > > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > >> ++++++++++++------------------------------------- > > > >> xlators/cluster/afr/src/afr-self-heald.c | 45 > > > >> +++++++++++++++++++++++++++++------------- > > > >> xlators/cluster/afr/src/pump.c | 12 ++++++++++-- > > > >> xlators/cluster/dht/src/dht-helper.c | 24 > > > >> +++++++++++++++-------- > > > >> xlators/cluster/dht/src/dht-rebalance.c | 168 > > > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > >> ++++++++++++++++++++++++++++++++++-------------------------- > > > >> ------------------------------------- > > > >> xlators/cluster/dht/src/dht-selfheal.c | 6 ++++-- > > > >> xlators/features/locks/src/posix.c | 3 ++- > > > >> xlators/features/qemu-block/src/bdrv-xlator.c | 15 ++++---------- > > > >> xlators/features/qemu-block/src/qb-coroutines.c | 14 ++++++++++---- > > > >> xlators/mount/fuse/src/fuse-bridge.c | 16 > > > >> ++++++++++----- > > > >> > > > >> Please review your respective component for these changes in gerrit. > > > >> > > > >> Thanks > > > >> Pranith. > > > >> > > > >> Detailed explanation of the Root cause: > > > >> We found the bug in 'gf_defrag_migrate_data' in rebalance operation: > > > >> > > > >> Lets look at interesting parts of the function: > > > >> > > > >> int > > > >> gf_defrag_migrate_data (xlator_t *this, gf_defrag_info_t *defrag, > > > >> loc_t > > > >> *loc, > > > >> dict_t *migrate_data) > > > >> { > > > >> ..... > > > >> code section - [ Loop ] > > > >> while ((ret = syncop_readdirp (this, fd, 131072, offset, > > > >> NULL, > > > >> &entries)) != 0) { > > > >> ..... > > > >> code section - [ ERRNO-1 ] (errno of readdirp is stored in > > > >> readdir_operrno by a thread) > > > >> /* Need to keep track of ENOENT errno, that means, > > > >> there > > > >> is no > > > >> need to send more readdirp() */ > > > >> readdir_operrno = errno; > > > >> ..... > > > >> code section - [ SYNCOP-1 ] (syncop_getxattr is called by a thread) > > > >> ret = syncop_getxattr (this, &entry_loc, > > > >> &dict, > > > >> GF_XATTR_LINKINFO_KEY); > > > >> code section - [ ERRNO-2] (checking for failures of > > > >> syncop_getxattr(). > > > >> This may not always be executed in same thread which executed > > > >> [SYNCOP-1]) > > > >> if (ret < 0) { > > > >> if (errno != ENODATA) { > > > >> loglevel = GF_LOG_ERROR; > > > >> defrag->total_failures += 1; > > > >> ..... > > > >> } > > > >> > > > >> the function above could be executed by thread(t1) till [SYNCOP-1] and > > > >> code from [ERRNO-2] can be executed by a different thread(t2) because > > > >> of > > > >> the way syncop-infra schedules the tasks. > > > >> > > > >> when the code is compiled with -O2 optimization this is the assembly > > > >> code > > > >> that is generated: > > > >> [ERRNO-1] > > > >> 1165 readdir_operrno = errno; <<---- errno gets > > > >> expanded as *(__errno_location()) > > > >> 0x00007fd149d48b60 <+496>: callq 0x7fd149d410c0 > > > >> <__errno_location@plt> > > > >> 0x00007fd149d48b72 <+514>: mov %rax,0x50(%rsp) <<------ > > > >> Address returned by __errno_location() is stored in a special location > > > >> in > > > >> stack for later use. > > > >> 0x00007fd149d48b77 <+519>: mov (%rax),%eax > > > >> 0x00007fd149d48b79 <+521>: mov %eax,0x78(%rsp) > > > >> .... > > > >> [ERRNO-2] > > > >> 1281 if (errno != ENODATA) { > > > >> 0x00007fd149d492ae <+2366>: mov 0x50(%rsp),%rax <<----- > > > >> Because it already stored the address returned by __errno_location(), > > > >> it > > > >> just dereferences the address to get the errno value. BUT THIS CODE > > > >> NEED > > > >> NOT BE EXECUTED BY SAME THREAD!!! > > > >> 0x00007fd149d492b3 <+2371>: mov $0x9,%ebp > > > >> 0x00007fd149d492b8 <+2376>: mov (%rax),%edi > > > >> 0x00007fd149d492ba <+2378>: cmp $0x3d,%edi > > > >> > > > >> The problem is that __errno_location() value of t1 and t2 are > > > >> different. > > > >> So [ERRNO-2] ends up reading errno of t1 instead of errno of t2 even > > > >> though > > > >> t2 is executing [ERRNO-2] code section. > > > >> > > > >> When code is compiled without any optimization for [ERRNO-2]: > > > >> 1281 if (errno != ENODATA) { > > > >> 0x00007fd58e7a326f <+2237>: callq 0x7fd58e797300 > > > >> <__errno_location@plt><<--- As it is calling __errno_location() again > > > >> it > > > >> gets the location from t2 so it works as intended. > > > >> 0x00007fd58e7a3274 <+2242>: mov (%rax),%eax > > > >> 0x00007fd58e7a3276 <+2244>: cmp $0x3d,%eax > > > >> 0x00007fd58e7a3279 <+2247>: je 0x7fd58e7a32a1 > > > >> <gf_defrag_migrate_data+2287> > > > >> > > > >> Fix: > > > >> We decided to make syncop_xxx() return (-errno) value as the return > > > >> value > > > >> in case of errors and all the functions which make syncop_xxx() will > > > >> need > > > >> to use (-ret) to figure out the reason for failure in case of > > > >> syncop_xxx() > > > >> failures. > > > >> > > > >> Pranith > > > >> > > > >> _______________________________________________ > > > >> Gluster-devel mailing list > > > >> Gluster-devel@nongnu.org > > > >> https://lists.nongnu.org/mailman/listinfo/gluster-devel > > > >> > > > > > > > > > > > > _______________________________________________ > > > > Gluster-devel mailing list > > > > Gluster-devel@nongnu.org > > > > https://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel@nongnu.org > > https://lists.nongnu.org/mailman/listinfo/gluster-devel > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@nongnu.org > https://lists.nongnu.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel