[Bug 1640518] Re: MongoDB Memory corruption

2016-11-15 Thread Andrew Morrow
Hi Bill - Thanks for the glibc bug link. Totally understand about people being out, not a problem. However, I'm not very familiar with the development process for upstream glibc fixes to make their way into an LTS release. Do you have a rough estimate of the timeline for that landing in

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-14 Thread Andrew Morrow
Hi Bill - Thanks for the update, and for clarifying that this is POWER 16.04 only. We are very happy to be at a root cause for this issue - it had us pretty worried! We really appreciate all the help from everyone involved here. Will there be an upstream glibc bug associated with this ticket

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-14 Thread Andrew Morrow
Aaron thank you very much for running that experiment and confirming that this is an issue in libc. I think the component should probably be updated? Also, would you like us to try to continue to repro on a Skylake machine, or is this all architecture neutral code and therefore the POWER repro is

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-14 Thread Andrew Morrow
I think it is very likely that we are doing the sort of stack-based mutex pattern described above, or something similar. In particular, I'd expect that we certainly have states where we wait on a stack mutex, and then immediately unwind and destroy the mutex after we unblock. I'm working on

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-12 Thread Andrew Morrow
Regarding Ubuntu 15, I think that was a miscommunication somewhere along the line. The only versions of Ubuntu that we build for are the LTS releases (12.04, 14.04, and 16.04), and the only one of those we have ever built on POWER is 16.04. Other than Ubuntu 16.04, the only other POWER distro we

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-11 Thread Andrew Morrow
An engineer on our side did some Canary+mprotect experiments as well, but I don't happen to have details on what the approach/results were right now. I'll ask them to update this ticket with any interesting findings they may have. -- You received this bug notification because you are a member of

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-11 Thread Andrew Morrow
Adam I agree on all points. So far, my repro running with the LD_PRELOAD hack is at 118 iterations with no crashes and going strong. Given that we had an ~5% repro rate without the LD_PRELOAD hack, this is looking very encouraging, but I'm going to let it run all weekend just to be sure. As for

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-11 Thread Andrew Morrow
Arron, re #50, yes, you can run as may copies as you want simultaneously, as long as: 1) The --dbpathPrefix argument points to distinct paths. So resmoke.py ... --dbpathPrefix=/var/tmp/run1 and resmoke.py ... --dbpathPrefix=/var/tmp/run2, etc. 2) You specify disjoint "port ranges" with the

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-11 Thread Andrew Morrow
Peter, re #47, yes, that is certainly true. However, I'm actually finding it advantageous to load it via LD_PRELOAD exactly because I don't need to recompile. So I can toggle back and forth between lock elision on/off without needing to recompile. -- You received this bug notification because

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-11 Thread Andrew Morrow
That is good news that you have been able to reproduce the issue. I'm currently running the reproducer with the LD_PRELOAD disable-lock- elision hack in place, without valgrind, and I'm currently at 55 runs with no crashes. I will let it run overnight. Also, per the earlier comment about double

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-11 Thread Andrew Morrow
First, I'm not sure what I was doing wrong yesterday, but I now have the LD_PRELOAD lock-elision-disablement running. And, when running under valgrind, we no longer see the reports from valgrind. I'm now running without valgrind to see whether we still observe stack corruption. A few comments on

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-10 Thread Andrew Morrow
I don't think the interposition is working, or I'm doing something wrong. I changed pthread_mutex_lock.c to the following: $ cat pthread_mutex_lock.c #include #include #define PTHREAD_MUTEX_NO_ELISION_NP 512 extern int __pthread_mutex_lock (pthread_mutex_t *); int pthread_mutex_lock

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-10 Thread Andrew Morrow
I have the libfoo.so.1 interposer running, I will let it run overnight and report back tomorrow with any interesting findings. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640518 Title: MongoDB

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-10 Thread Andrew Morrow
OK, I upgraded valgrind to 3.12 on the power machine and I can now get it to run meaningfully. We are seeing many error reports of the following form: [js_test:fsm_all_sharded_replication] 2016-11-10T16:19:58.396+ s40019| ==34604== Thread 50: [js_test:fsm_all_sharded_replication]

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-10 Thread Andrew Morrow
I tried valgrind as suggested above. By adding --show-mismatched-frees=no and removing --track-origins=yes I was able to get the process to start up without a lot of false positives. However, the server process fails to open its listening socket, because valgrind reports an unsupported syscall:

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-10 Thread Andrew Morrow
Overnight, I ran this test case on both an Ubuntu 16.04 ppc64le system and a RHEL 7.1 ppc64le system. The test ran 219 times on Ubuntu, with 15 cores, for a failure rate of around 5%. Most of the time corruption was detected in the Canary ctor (before doing other work), but a few times in the

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-09 Thread Andrew Morrow
Bill - I will try again with valgrind without --track-origins=yes and post any interesting findings. Re ThreadSanitizer, we have tried before without success. The last time we tried, it didn't work because clang TSAN didn't support exceptions. Perhaps that has changed? We really like the

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-09 Thread Andrew Morrow
The following are reproduction instructions for the behavior that we are observing on Ubuntu 16.04 ppc64le. Note that we have run this same test on RHEL 7.1 ppc64le, and we do not observe any stack corruption. Note also that building and running this repro may depend on certain system libraries

[Bug 1640518] Re: MongoDB Memory corruption

2016-11-09 Thread Andrew Morrow
Here is the patch for the above comment ** Patch added: "Apply to 3220495083b0d678578a76591f54ee1d7a5ec5df" https://bugs.launchpad.net/ubuntu/+source/gcc-5/+bug/1640518/+attachment/4775111/+files/acm.nov9.patch -- You received this bug notification because you are a member of Ubuntu Bugs,

[Bug 1631933] Re: upgrade to a more current mongoc library

2016-10-11 Thread Andrew Morrow
Note that upgrading to at least the 1.3.5 would allow users to build the mongocxx 3.0.x C++11 driver releases against the system version of mongoc, rather than needing to build from source. The current system version of C driver 1.3.1 is insufficient to build the C++ driver. -- You received this

[Bug 469184] Re: ubuntuone-client-applet crashed with AttributeError in from_token_and_callback()

2010-01-19 Thread andrew-morrow
The program appears to be working correctly now. So as far as I know, the error has been fixed. -- ubuntuone-client-applet crashed with AttributeError in from_token_and_callback() https://bugs.launchpad.net/bugs/469184 You received this bug notification because you are a member of Ubuntu Bugs,