> > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0xf7dbfb40 (LWP 7)] > > 0xf7f7f988 in __lll_unlock_elision () from /usr/lib/libpthread.so.0 > > (gdb) bt > > #0 0xf7f7f988 in __lll_unlock_elision () from > /usr/lib/libpthread.so.0 > > #1 0xf7f790e0 in __pthread_mutex_unlock_usercnt () from > /usr/lib/libpthread.so.0 > > #2 0xf7f79bff in pthread_cond_wait@@GLIBC_2.3.2 () from > /usr/lib/libpthread.so.0 > > #3 0x08061bfe in item_crawler_thread () > > #4 0xf7f75f20 in start_thread () from /usr/lib/libpthread.so.0 > > #5 0xf7ead94e in clone () from /usr/lib/libc.so.6 > > Holy crap lock elision. I have one machine with a haswell chip here, but > I'll have to USB boot. Is getting an Arch liveimage especially time > consuming? > > > Not at all; if you download the latest install ISO > (https://www.archlinux.org/download/) it is a live CD and you can boot > straight into an Arch > environment. You can do an install if you want, or just run live and install > any necessary packages (`pacman -S base-devel gdb`) and go from there.
Okay, seems like I'll have to give it a shot since this still isn't working well. > > https://github.com/dormando/memcached/tree/crawler_fix > > Can you try this? The lock elision might've made my "undefined behavior" > mistake of not holding a lock before initially waiting on the condition > fatal. > > A further fix might be required, as it's possible someone could kill the > do_etc flag before the thread fully starts and it'd drop out with the > lock > held. That would be an incredible feat though. > > > The good news here is now that we found our way to lock elision, both 64-bit > and 32-bit builds (including one straight from git and outside the > normal packaging build machinery) blow up in the same place. No segfault > after applying this patch, so we've made progress. I love progress. > > > > Thanks! > > > > > > > > On the 64bit host, can you try increasing the sleep on > t/lru-crawler.t:39 > > > from 3 to 8 and try again? I was trying to be clever > but that may not be > > > working out. > > > > > > > > > Didn't change anything, same two failures with the same > output listed. > > > > I feel like something's a bit different between your two tests. In the > > first set, it's definitely not crashing for the 64bit test, but not > > working either. Is something weird going on with the second set of > tests? > > You noted it seems to be running a 32bit binary still. > > > > I'm willing to ignore the 64-bit failures for now until we figure out > the 32-bit ones. > > > > In any case, I wouldn't blame the cross-compile or toolchain, these > have all been built in very clean, single architecture > systemd-nspawn chroots. > > Thanks, I'm just trying to reason why it's failing in two different ways. > The initial failure of finding 90 items when it expected 60 is a timing > glitch, the other ones are this thread crashing the daemon. > > > One machine was an i7 with TSX, thus the lock elision segfaults. The other is > a much older Core2 machine. Enough differences there to cause > problems, especially if we are dealing with threading-type things? Can you give me a summary of what the core2 machine gave you? I've built on a core2duo and nehalem i7 and they all work fine. I've also torture tested it on a brand new 16 core (2x8) xeon. > On the i7 machine, I think we're still experiencing segfaults. Running just > the LRU test; note the two "undef" values showing up again: > > $ prove t/lru-crawler.t > t/lru-crawler.t .. 93/189 > # Failed test 'slab1 now has 60 used chunks' > # at t/lru-crawler.t line 57. > # got: '90' > # expected: '60' > > # Failed test 'slab1 has 30 reclaims' > # at t/lru-crawler.t line 59. > # got: '0' > # expected: '30' > > # Failed test 'disabled lru crawler' > # at t/lru-crawler.t line 69. > # got: undef > # expected: 'OK > # ' > > # Failed test at t/lru-crawler.t line 72. > # got: undef > # expected: 'no' > # Looks like you failed 4 tests of 189. > t/lru-crawler.t .. Dubious, test returned 4 (wstat 1024, 0x400) > Failed 4/189 subtests > > > Changing the `sleep 3` to `sleep 8` gives non-deterministic results; two runs > in a row were different. > > $ prove t/lru-crawler.t > t/lru-crawler.t .. 93/189 > # Failed test 'slab1 now has 60 used chunks' > # at t/lru-crawler.t line 57. > # got: '90' > # expected: '60' > > # Failed test 'slab1 has 30 reclaims' > # at t/lru-crawler.t line 59. > # got: '0' > # expected: '30' > > # Failed test 'ifoo29 == 'ok'' > # at /home/dan/memcached/t/lib/MemcachedTest.pm line 59. > # got: undef > # expected: 'VALUE ifoo29 0 2 > # ok > # END > # ' > t/lru-crawler.t .. Failed 10/189 subtests > > Test Summary Report > ------------------- > t/lru-crawler.t (Wstat: 13 Tests: 182 Failed: 3) > Failed tests: 96-97, 182 > Non-zero wait status: 13 > Parse errors: Bad plan. You planned 189 tests but ran 182. > Files=1, Tests=182, 8 wallclock secs ( 0.03 usr 0.00 sys + 0.04 cusr 0.00 > csys = 0.07 CPU) > Result: FAIL > > > $ prove t/lru-crawler.t > t/lru-crawler.t .. 93/189 > # Failed test 'slab1 now has 60 used chunks' > # at t/lru-crawler.t line 57. > # got: '90' > # expected: '60' > > # Failed test 'slab1 has 30 reclaims' > # at t/lru-crawler.t line 59. > # got: '0' > # expected: '30' > > # Failed test 'sfoo28 == <undef>' > # at /home/dan/memcached/t/lib/MemcachedTest.pm line 53. > # got: undef > # expected: 'END > # ' > t/lru-crawler.t .. Failed 11/189 subtests > > Test Summary Report > ------------------- > t/lru-crawler.t (Wstat: 13 Tests: 181 Failed: 3) > Failed tests: 96-97, 181 > Non-zero wait status: 13 > Parse errors: Bad plan. You planned 189 tests but ran 181. > Files=1, Tests=181, 8 wallclock secs ( 0.02 usr 0.00 sys + 0.03 cusr 0.00 > csys = 0.05 CPU) > Result: FAIL Ok. I might still be goofing the lock somewhere. Can you see if memcached is crashing at all during these tests? Inside the test script you can see it's just a few raw commands to copy/paste and try yourself. You can also use an environment variable to start a memcached external to the tests within a debugger: if ($ENV{T_MEMD_USE_DAEMON}) { my ($host, $port) = ($ENV{T_MEMD_USE_DAEMON} =~ m/^([^:]+):(\d+)$/); T_MEMD_USE_DAEMON="127.0.0.1:11211" or something, I think. haven't used that in a while. Thanks! -- --- You received this message because you are subscribed to the Google Groups "memcached" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
