> On Sat, Apr 19, 2014 at 2:45 PM, dormando <[email protected]> wrote:
>       > One machine was an i7 with TSX, thus the lock elision segfaults. The 
> other is a much older Core2 machine. Enough differences there to
>       cause
>       > problems, especially if we are dealing with threading-type things?
>
> Can you give me a summary of what the core2 machine gave you? I've built
> on a core2duo and nehalem i7 and they all work fine. I've also torture
> tested it on a brand new 16 core (2x8) xeon.
>
>
> I ran the test suite on the Core2 a number of times (at least 5). Sometimes 
> it completes without failure, other times I still get these two
> failures. This is with `sleep 3` changed to `sleep 8`.
>
> #   Failed test 'slab1 now has 60 used chunks'
> #   at t/lru-crawler.t line 57.
> #          got: '90'
> #     expected: '60'
>
> #   Failed test 'slab1 has 30 reclaims'
> #   at t/lru-crawler.t line 59.
> #          got: '0'
> #     expected: '30'
> # Looks like you failed 2 tests of 189.
> t/lru-crawler.t ...... Dubious, test returned 2 (wstat 512, 0x200)
> Failed 2/189 subtests

Makes no goddamn sense. Maybe the fix below will.. fix it.
  
>
>       > On the i7 machine, I think we're still experiencing segfaults. 
> Running just the LRU test; note the two "undef" values showing up
>       again:
>       >
>
> Ok. I might still be goofing the lock somewhere. Can you see if memcached
> is crashing at all during these tests? Inside the test script you can see
> it's just a few raw commands to copy/paste and try yourself.
>
> You can also use an environment variable to start a memcached external to
> the tests within a debugger:
>     if ($ENV{T_MEMD_USE_DAEMON}) {
>         my ($host, $port) = ($ENV{T_MEMD_USE_DAEMON} =~
> m/^([^:]+):(\d+)$/);
>
> T_MEMD_USE_DAEMON="127.0.0.1:11211" or something, I think. haven't used
> that in a while.
>
>
> Simple repro, running standalone, no other commands have been issued:
> $ nc localhost 11211
> lru_crawler enable
> OK
> lru_crawler crawl 1
> OK
> lru_crawler disable
>
> SIGSEGV happens, here is the backtrace (surprised to see it in 
> start_thread...):
>
> (gdb) bt
> #0  0x00007ffff79881c8 in __lll_unlock_elision () from 
> /usr/lib/libpthread.so.0
> #1  0x00007ffff7982fc7 in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /usr/lib/libpthread.so.0
> #2  0x0000000000414f61 in item_crawler_thread (arg=<optimized out>) at 
> items.c:771
> #3  0x00007ffff797f0a2 in start_thread () from /usr/lib/libpthread.so.0
> #4  0x00007ffff76b4d1d in clone () from /usr/lib/libc.so.6
>
> It does NOT segfault if you run enable immediately followed by disable, with 
> no `crawl 1` in between.

Good lord I suck at this. I really wish I could make that
pthread_cond_wait "undefined" behavior actually error out so I don't test
this on 3+ platforms and then have it error out elsewhere :/

Just force-pushed this:
https://github.com/dormando/memcached/tree/crawler_fix

At some point I'd refactored it and didn't push the unlock far enough
south. Now it actually unlocks when it's stopping the thread...

Please try again. Wonder if I can somehow fund getting a haswell NUC
bought just for my build VM's. Will TRX work within a VM..?

None of the other places I intend to run build VM's have lock elision...

Thanks for your patience on this. It's been a huge help!

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"memcached" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to