@ Raghavendra. The io threads stack wind/ unwind may be the reason This is what the stack looked like when it hit the mprotect guarded page. I am not sure why there are so few frames.. but the register values did match to the guarded address. so we know stack was being overused. (gdb) where #0 0x00007f8a92c51204 in _dl_lookup_symbol_x () from /lib64/ld-linux-x86-64.so.2 #1 0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 #2 0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from /lib64/ld-linux-x86-64.so.2 #3 0x0000000000000000 in ?? ()
If we try without it overflows the syncop stack(which was allocated with callloc) and goes to some other address allocated though calloc and results in a segfault elsewhere. @Kinglong, Yes, increasing stack size to 256 k was enough to alleviate the issue in my tests. Regards, Sanoj On Wed, Jun 28, 2017 at 6:10 PM, Kinglong Mee <[email protected]> wrote: > Hi Sanoj, > > Do you testing with changing the 16K to 2M (default stack size)? > > If also overflowing, there must be an important issue exist; > not overflowing, only the issue of 16K is small for marker without > io-threads? > > thanks, > Kinglong Mee > > On 6/28/2017 19:48, Sanoj Unnikrishnan wrote: > > Hi Kinglong, > > > > I opened https://bugzilla.redhat.com/show_bug.cgi?id=1465861 to track > this. It seems to be an important issue that went undetected. > > Thanks for catching this!! > > > > In my attempts to repro this , I found that on each run some random > structures where getting corrupted and running into segfault. > > In order to assert that the stack was indeed growing into all the > allocated space and beyond, I set a guard page in the end of the allocated > stack space (so that we hit a segfault before overusing the space). > > Below are the code changes. > > > > @@ -443,6 +443,8 @@ synctask_create (struct syncenv *env, size_t > stacksize, synctask_fn_t fn, > > struct synctask *newtask = NULL; > > xlator_t *this = THIS; > > int destroymode = 0; > > + int r=0; > > + char *v; > > > > VALIDATE_OR_GOTO (env, err); > > VALIDATE_OR_GOTO (fn, err); > > @@ -498,9 +500,15 @@ synctask_create (struct syncenv *env, size_t > stacksize, synctask_fn_t fn, > > gf_common_mt_syncstack); > > newtask->ctx.uc_stack.ss_size = env->stacksize; > > } else { > > - newtask->stack = GF_CALLOC (1, stacksize, > > + newtask->stack = GF_CALLOC (1, stacksize, > > gf_common_mt_syncstack); > > newtask->ctx.uc_stack.ss_size = stacksize; > > + if (stacksize == 16*1024) { > > + v = (unsigned long)((char *)(newtask->stack) + > 4095) & (~4095); > > + r = mprotect(v, 4096, PROT_NONE); > > + gf_msg ("syncop", GF_LOG_ERROR, errno, > > + LG_MSG_GETCONTEXT_FAILED, "SKU: using > 16k stack starting at %p, mprotect returned %d, guard page: %p", > newtask->stack, r, v); > > + } > > } > > > > (gdb) where > > #0 0x00007f8a92c51204 in _dl_lookup_symbol_x () from > /lib64/ld-linux-x86-64.so.2 > > #1 0x00007f8a92c561e3 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2 > > #2 0x00007f8a92c5dd33 in _dl_runtime_resolve_avx () from > /lib64/ld-linux-x86-64.so.2 > > #3 0x0000000000000000 in ?? () > > > > > > (gdb) info reg > > > > rdi 0x7f8a92946188 140233141412232 > > rbp 0x7f8a800b4000 0x7f8a800b4000 > > rsp 0x7f8a800b4000 0x7f8a800b4000 > > r8 0x7f8a92e4ba60 140233146677856 > > > > (gdb) layout asm > > > > >│0x7f8a92c51204 <_dl_lookup_symbol_x+4> push %r15 > <== push on stack at the guarded page caused segfault > > > > From the brick log we have, > > > > [syncop.c:515:synctask_create] 0-syncop: SKU: using 16k stack starting > at 0x7f8a800b28f0, mprotect returned 0, guard page: 0x7f8a800b3000 [No data > available] > > > > Stack grows downward from 0x7f8a800b68f0 to 0x7f8a800b28f0 and the page > 0x7f8a800b3000 - 0x7f8a800b4000 is guarded , which is where the segfault > hit as seen in gdb. > > > > This confirms that the stack space is not sufficient and overflowing, > > I am not sure why we don't hit this in the presence of IO threads > though, It may just be that with io threads in graph, we may have some > allocated and unused memory which our stack freely grows into. It may just > be a silent undetected reuse of some memory. > > > > Regards, > > Sanoj > > > > > > > > On Tue, Jun 13, 2017 at 10:49 AM, Sanoj Unnikrishnan < > [email protected] <mailto:[email protected]>> wrote: > > > > Hi Kinglong > > I was reading about makecontext/swapcontext as well, > > I did find an article that suggested to use mprotect and force a > segfault to check if we have a stack space issue here. > > here is the link. http://www.evanjones.ca/software/threading.html < > http://www.evanjones.ca/software/threading.html>. > > > > I don't think i can try this until tomorrow. > > thanks and regards, > > Sanoj > > > > On Tue, Jun 13, 2017 at 5:58 AM, Kinglong Mee <[email protected] > <mailto:[email protected]>> wrote: > > > > Hi Sanoj, > > > > What's your opinion about this problem? > > > > thanks, > > Kinglong Mee > > > > On 6/9/2017 17:20, Kinglong Mee wrote: > > > Hi Sanoj, > > > > > > On 6/9/2017 15:48, Sanoj Unnikrishnan wrote: > > >> I have not used valgrind before, so I may be wrong here. > > >> > > >> I think the valgrind_stack_deregister should have been after > GF_FREE_STACK. > > >> That may explain the instance of invalid_write during > stack_destroy calls in after.log. > > > > > > No. I move it, but the instance of invalid_write also exist. > > > > > >> > > >> There seems to be numerous issues reported in before.log (I > am assuming, you did not have the valgrind_stack_register call in it), > > > > > > Yes, the before.log is test without any code change(but > without io-threads). > > > > > >> From http://valgrind.org/docs/manual/manual-core.html < > http://valgrind.org/docs/manual/manual-core.html> < > http://valgrind.org/docs/manual/manual-core.html < > http://valgrind.org/docs/manual/manual-core.html>>, looks like valgrind > detects client switching stack only If a memory of > 2MB change in Stack > pointer register. > > > > > > I test with a larger max-stackframe as, > > > valgrind --leak-check=full --max-stackframe=242293216 > > > > > >> Is it possible that since marker is only using 16k, the stack > pointer could have been in less than 2MB offset from current Stack Pointer? > > > > > > Maybe. > > > But with io-threads (with adding valgrind_stack_deregister), > the valgrind only show some > > > "Invalid read/write" about __gf_mem_invalidate. > > > The only reason here I think is the stack size (16K) of marker > using. > > > > > > I have not used makecontext/swapcontext before, am i right? > > > 1. without swapconext, the stack maybe (just an example) > > > --> io_stats-> quota-> marker-> io-threads ->..... > > > > > > 2. with swapcontext, > > > --> io_stats-> quota-> marker > > > swithto new stack -> io-threads > > > > > > After switchto new stack, the new stack size is 16K, > > > Does it enough without io-threads? > > > > > > I don't know what's the behave of io-threads, does it call all > sub-xlator using the 16K ? or others? > > > > > >> It seems unlikely to me since we are allocating the stack > from heap. > > >> > > >> Did you try a run with the valgrind instrumentation, without > changing stack size ? > > > > > > OK. > > > The following valgrind-without-stack-change.log is test with > adding valgrind_stack_deregister > > > (and without io-threads). > > > > > > thanks, > > > Kinglong Mee > > > > > >> None of this explains the crash though.. We had seen a memory > overrun crash in same code path on netbsd earlier but did not follow up > then. > > >> Will look further into it. > > >> > > >> > > >> > > >> On Thu, Jun 8, 2017 at 4:51 PM, Kinglong Mee < > [email protected] <mailto:[email protected]> <mailto: > [email protected] <mailto:[email protected]>>> wrote: > > >> > > >> Maybe it's my fault, I found valgrind can't parse context > switch(makecontext/swapcontext) by default. > > >> So, I test with the following patch (tells valgrind new > stack by VALGRIND_STACK_DEREGISTER). > > >> With it, only some "Invalid read/write" by > __gf_mem_invalidate, Does it right ?? > > >> So, there is only one problem, if without io-threads, the > stack size is small for marker. > > >> Am I right? > > >> > > >> Ps: > > >> valgrind-before.log is the log without the following > patch, the valgrind-after.log is with the patch. > > >> > > >> ==35656== Invalid write of size 8 > > >> ==35656== at 0x4E8FFD4: __gf_mem_invalidate > (mem-pool.c:278) > > >> ==35656== by 0x4E90313: __gf_free (mem-pool.c:334) > > >> ==35656== by 0x4EA4E5B: synctask_destroy (syncop.c:394) > > >> ==35656== by 0x4EA4EDF: synctask_done (syncop.c:412) > > >> ==35656== by 0x4EA58B3: synctask_switchto > (syncop.c:673) > > >> ==35656== by 0x4EA596B: syncenv_processor > (syncop.c:704) > > >> ==35656== by 0x60B2DC4: start_thread (in /usr/lib64/ > libpthread-2.17.so <http://libpthread-2.17.so> <http://libpthread-2.17.so > >) > > >> ==35656== by 0x67A873C: clone (in /usr/lib64/ > libc-2.17.so <http://libc-2.17.so> <http://libc-2.17.so>) > > >> ==35656== Address 0x1b104931 is 2,068,017 bytes inside a > block of size 2,097,224 alloc'd > > >> ==35656== at 0x4C29975: calloc > (vg_replace_malloc.c:711) > > >> ==35656== by 0x4E8FA5E: __gf_calloc (mem-pool.c:117) > > >> ==35656== by 0x4EA52F5: synctask_create (syncop.c:500) > > >> ==35656== by 0x4EA55AE: synctask_new1 (syncop.c:576) > > >> ==35656== by 0x143AE0D7: mq_synctask1 > (marker-quota.c:1078) > > >> ==35656== by 0x143AE199: mq_synctask > (marker-quota.c:1097) > > >> ==35656== by 0x143AE6F6: _mq_create_xattrs_txn > (marker-quota.c:1236) > > >> ==35656== by 0x143AE82D: mq_create_xattrs_txn > (marker-quota.c:1253) > > >> ==35656== by 0x143B0DCB: mq_inspect_directory_xattr > (marker-quota.c:2027) > > >> ==35656== by 0x143B13A8: mq_xattr_state > (marker-quota.c:2117) > > >> ==35656== by 0x143A6E80: marker_lookup_cbk > (marker.c:2961) > > >> ==35656== by 0x141811E0: up_lookup_cbk (upcall.c:753) > > >> > > >> ----------------------- valgrind > ------------------------------------------ > > >> > > >> Don't forget install valgrind-devel. > > >> > > >> diff --git a/libglusterfs/src/syncop.c > b/libglusterfs/src/syncop.c > > >> index 00a9b57..97b1de1 100644 > > >> --- a/libglusterfs/src/syncop.c > > >> +++ b/libglusterfs/src/syncop.c > > >> @@ -10,6 +10,7 @@ > > >> > > >> #include "syncop.h" > > >> #include "libglusterfs-messages.h" > > >> +#include <valgrind/valgrind.h> > > >> > > >> int > > >> syncopctx_setfsuid (void *uid) > > >> @@ -388,6 +389,8 @@ synctask_destroy (struct synctask > *task) > > >> if (!task) > > >> return; > > >> > > >> +VALGRIND_STACK_DEREGISTER(task->valgrind_ret); > > >> + > > >> GF_FREE (task->stack); > > >> > > >> if (task->opframe) > > >> @@ -509,6 +512,8 @@ synctask_create (struct syncenv *env, > size_t stacksize, sync > > >> > > >> newtask->ctx.uc_stack.ss_sp = newtask->stack; > > >> > > >> + newtask->valgrind_ret = > > VALGRIND_STACK_REGISTER(newtask->stack, > newtask- > > >> + > > >> makecontext (&newtask->ctx, (void (*)(void)) > synctask_wrap, 2, newtask) > > >> > > >> newtask->state = SYNCTASK_INIT; > > >> diff --git a/libglusterfs/src/syncop.h > b/libglusterfs/src/syncop.h > > >> index c2387e6..247325b 100644 > > >> --- a/libglusterfs/src/syncop.h > > >> +++ b/libglusterfs/src/syncop.h > > >> @@ -63,6 +63,7 @@ struct synctask { > > >> int woken; > > >> int slept; > > >> int ret; > > >> + int valgrind_ret; > > >> > > >> uid_t uid; > > >> gid_t gid; > > >> diff --git a/xlators/features/marker/src/marker-quota.c > b/xlators/features/marke > > >> index 902b8e5..f3d2507 100644 > > >> --- a/xlators/features/marker/src/marker-quota.c > > >> +++ b/xlators/features/marker/src/marker-quota.c > > >> @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, > synctask_fn_t task, gf_boole > > >> } > > >> > > >> if (spawn) { > > >> - ret = synctask_new1 (this->ctx->env, > 1024 * 16, task, > > >> + ret = synctask_new1 (this->ctx->env, 0, > task, > > >> > mq_synctask_cleanup, NULL, args); > > >> if (ret) { > > >> gf_log (this->name, > GF_LOG_ERROR, "Failed to spawn " > > >> > > >> > > >> On 6/8/2017 19:02, Sanoj Unnikrishnan wrote: > > >> > I would still be worried about the Invalid read/write. > IMO whether an illegal access causes a crash depends on whether the page is > currently mapped. > > >> > So, it could so happen that there is a use after free / > use outside of bounds happening in the code and it turns out that this > location gets mapped in a different (unmapped) page when IO threads is not > loaded. > > >> > > > >> > Could you please share the valgrind logs as well. > > >> > > > >> > On Wed, Jun 7, 2017 at 8:22 PM, Kinglong Mee < > [email protected] <mailto:[email protected]> <mailto: > [email protected] <mailto:[email protected]>> <mailto: > [email protected] <mailto:[email protected]> <mailto: > [email protected] <mailto:[email protected]>>>> wrote: > > >> > > > >> > After deleting io-threads from the vols, quota > operates (list/set/modify) lets glusterfsd crash. > > >> > I use it at CentOS 7 (CentOS Linux release > 7.3.1611) with glusterfs 3.8.12. > > >> > It seems the stack corrupt, when testing with the > following diff, glusterfsd runs correctly. > > >> > > > >> > There are two questions as, > > >> > 1. When using valgrind, it shows there are many > "Invalid read/write" when with io-threads. > > >> > Why glusterfsd runs correctly with io-threads? > but crash without io-threads? > > >> > > > >> > 2. With the following diff, valgrind also shows > many "Invalid read/write" when without io-threads? > > >> > but no any crash. > > >> > > > >> > Any comments are welcome. > > >> > > > >> > Revert http://review.gluster.org/11499 < > http://review.gluster.org/11499> <http://review.gluster.org/11499 < > http://review.gluster.org/11499>> <http://review.gluster.org/11499 < > http://review.gluster.org/11499> <http://review.gluster.org/11499 < > http://review.gluster.org/11499>>> seems better than the diff. > > >> > > > >> > diff --git a/xlators/features/marker/src/marker-quota.c > b/xlators/features/marke > > >> > index 902b8e5..f3d2507 100644 > > >> > --- a/xlators/features/marker/src/marker-quota.c > > >> > +++ b/xlators/features/marker/src/marker-quota.c > > >> > @@ -1075,7 +1075,7 @@ mq_synctask1 (xlator_t *this, > synctask_fn_t task, gf_boole > > >> > } > > >> > > > >> > if (spawn) { > > >> > - ret = synctask_new1 > (this->ctx->env, 1024 * 16, task, > > >> > + ret = synctask_new1 > (this->ctx->env, 0, task, > > >> > > mq_synctask_cleanup, NULL, args); > > >> > if (ret) { > > >> > gf_log (this->name, > GF_LOG_ERROR, "Failed to spawn " > > >> > > > >> > -----------------------------------test steps > ---------------------------------- > > >> > 1. gluster volume create gvtest node1:/test/ > node2:/test/ > > >> > 2. gluster volume start gvtest > > >> > 3. gluster volume quota enable gvtest > > >> > > > >> > 4. "deletes io-threads from all vols" > > >> > 5. reboot node1 and node2. > > >> > 6. sh quota-set.sh > > >> > > > >> > # cat quota-set.sh > > >> > gluster volume quota gvtest list > > >> > gluster volume quota gvtest limit-usage / 10GB > > >> > gluster volume quota gvtest limit-usage /1234 1GB > > >> > gluster volume quota gvtest limit-usage /hello 1GB > > >> > gluster volume quota gvtest limit-usage /test 1GB > > >> > gluster volume quota gvtest limit-usage /xyz 1GB > > >> > gluster volume quota gvtest list > > >> > gluster volume quota gvtest remove /hello > > >> > gluster volume quota gvtest remove /test > > >> > gluster volume quota gvtest list > > >> > gluster volume quota gvtest limit-usage /test 1GB > > >> > gluster volume quota gvtest remove /xyz > > >> > gluster volume quota gvtest list > > >> > > > >> > -----------------------glusterfsd crash without > the diff-------------------------------- > > >> > > > >> > /usr/local/lib/libglusterfs. > so.0(_gf_msg_backtrace_nomem+0xf5)[0x7f6e1e950af1] > > >> > /usr/local/lib/libglusterfs. > so.0(gf_print_trace+0x21f)[0x7f6e1e956943] > > >> > /usr/local/sbin/glusterfsd( > glusterfsd_print_trace+0x1f)[0x409c83] > > >> > /lib64/libc.so.6(+0x35250)[0x7f6e1d025250] > > >> > /lib64/libc.so.6(gsignal+0x37)[0x7f6e1d0251d7] > > >> > /lib64/libc.so.6(abort+0x148)[0x7f6e1d0268c8] > > >> > /lib64/libc.so.6(+0x74f07)[0x7f6e1d064f07] > > >> > /lib64/libc.so.6(+0x7baf5)[0x7f6e1d06baf5] > > >> > /lib64/libc.so.6(+0x7c3e6)[0x7f6e1d06c3e6] > > >> > /usr/local/lib/libglusterfs.so.0(__gf_free+0x311)[ > 0x7f6e1e981327] > > >> > /usr/local/lib/libglusterfs. > so.0(synctask_destroy+0x82)[0x7f6e1e995c20] > > >> > /usr/local/lib/libglusterfs. > so.0(synctask_done+0x25)[0x7f6e1e995c47] > > >> > /usr/local/lib/libglusterfs. > so.0(synctask_switchto+0xcf)[0x7f6e1e996585] > > >> > /usr/local/lib/libglusterfs. > so.0(syncenv_processor+0x60)[0x7f6e1e99663d] > > >> > /lib64/libpthread.so.0(+0x7dc5)[0x7f6e1d7a2dc5] > > >> > /lib64/libc.so.6(clone+0x6d)[0x7f6e1d0e773d] > > >> > > > >> > or > > >> > > > >> > package-string: glusterfs 3.8.12 > > >> > /usr/local/lib/libglusterfs. > so.0(_gf_msg_backtrace_nomem+0xf5)[0x7fa15e623af1] > > >> > /usr/local/lib/libglusterfs. > so.0(gf_print_trace+0x21f)[0x7fa15e629943] > > >> > /usr/local/sbin/glusterfsd( > glusterfsd_print_trace+0x1f)[0x409c83] > > >> > /lib64/libc.so.6(+0x35250)[0x7fa15ccf8250] > > >> > /lib64/libc.so.6(gsignal+0x37)[0x7fa15ccf81d7] > > >> > /lib64/libc.so.6(abort+0x148)[0x7fa15ccf98c8] > > >> > /lib64/libc.so.6(+0x74f07)[0x7fa15cd37f07] > > >> > /lib64/libc.so.6(+0x7dd4d)[0x7fa15cd40d4d] > > >> > /lib64/libc.so.6(__libc_ > calloc+0xb4)[0x7fa15cd43a14] > > >> > /usr/local/lib/libglusterfs.so.0(__gf_calloc+0xa7)[ > 0x7fa15e653a5f] > > >> > /usr/local/lib/libglusterfs.so.0(iobref_new+0x2b)[ > 0x7fa15e65875a] > > >> > /usr/local/lib/glusterfs/3.8. > 12/rpc-transport/socket.so(+0xa98c)[0x7fa153a8398c] > > >> > /usr/local/lib/glusterfs/3.8. > 12/rpc-transport/socket.so(+0xacbc)[0x7fa153a83cbc] > > >> > /usr/local/lib/glusterfs/3.8. > 12/rpc-transport/socket.so(+0xad10)[0x7fa153a83d10] > > >> > /usr/local/lib/glusterfs/3.8. > 12/rpc-transport/socket.so(+0xb2a7)[0x7fa153a842a7] > > >> > /usr/local/lib/libglusterfs. > so.0(+0x97ea9)[0x7fa15e68eea9] > > >> > /usr/local/lib/libglusterfs. > so.0(+0x982c6)[0x7fa15e68f2c6] > > >> > /lib64/libpthread.so.0(+0x7dc5)[0x7fa15d475dc5] > > >> > /lib64/libc.so.6(clone+0x6d)[0x7fa15cdba73d] > > >> > > > >> > _______________________________________________ > > >> > Gluster-devel mailing list > > >> > [email protected] <mailto: > [email protected]> <mailto:[email protected] <mailto: > [email protected]>> <mailto:[email protected] <mailto: > [email protected]> <mailto:[email protected] <mailto: > [email protected]>>> > > >> > http://lists.gluster.org/ > mailman/listinfo/gluster-devel <http://lists.gluster.org/ > mailman/listinfo/gluster-devel> <http://lists.gluster.org/ > mailman/listinfo/gluster-devel <http://lists.gluster.org/ > mailman/listinfo/gluster-devel>> <http://lists.gluster.org/ > mailman/listinfo/gluster-devel <http://lists.gluster.org/ > mailman/listinfo/gluster-devel> <http://lists.gluster.org/ > mailman/listinfo/gluster-devel <http://lists.gluster.org/ > mailman/listinfo/gluster-devel>>> > > >> > > > >> > > > >> > > >> > > > > > > > > > >
_______________________________________________ Gluster-devel mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-devel
