On Wed, Nov 09, 2016 at 03:17:06PM +0200, Otto Kekäläinen wrote: > 2016-11-08 11:08 GMT+02:00 Thadeu Lima de Souza Cascardo > <casca...@debian.org>: > > Hey, > > > > I built mariadb on my powerpc G4, it took a while and I got some OOM > > during some of the tests. So those tests failed, but the package got > > built anyway. I wonder if a simple rebuild would make it work on the > > build machine. > > > > I will see if I can get it to build on one of the porter machines. > > Thanks for looking into the issue. To me it looks like powerpc status > is already good > (https://buildd.debian.org/status/package.php?p=mariadb-10.0) but it > is good if you can check it and perhaps do some improvements.
So this requires further investigation. After running a build on a jessie schroot on partch, during the tests, mysqld is deadlocked. Investigating, I found out there were at least two threads locked on the same lock under jemalloc malloc/free. The reason for such deadlock was that during the exit of one of the threads, which took the lock when destroying tcache, there was a segfault. That segfault was caught up by a signal handler from mariadb, which ended up calling malloc, which tried to lock the same mutex, hence the deadlock. Now, of course a signal handler must take care of what it's doing, so at least this must be fixed. But the root cause is the segfault, which should not have happened. I wrote some tests using jemalloc and pthreads and found a small reproducer, which will cause a crash, though in a different point. Note that despite this using a single thread (the main task only calling pthread_join, but not using malloc/free directly), I can't reproduce the segfault on my single CPU. But this reproduces fairly well on partch. Using a sid schroot, this doesn't reproduce. As jemalloc has not changed much between jessie and sid (though upstream is fairly different and has a patch that does not apply to 3.6 regarding pthread __nptl_deallocate_tsd), I can only consider glibc as a possible difference that would explain it. It is very possible the root cause here is some odd interaction between glibc nptl code and jemalloc. Regards. Cascardo. --- #include <pthread.h> #include <stdlib.h> void * thread_run(void * arg) { int i; for (i = 2; i < 10000; i++) { free(malloc(i * 4)); } return NULL; } int main(int argc, char **argv) { pthread_t t1; pthread_create(&t1, NULL, thread_run, NULL); pthread_join(t1, NULL); return 0; } ---