Hi Christian, On Fri, 2018-02-02 09:27:59 +0100, Christian Grothoff <groth...@gnunet.org> wrote: > This is very strange, the loop should not fix this, as pthread_join > should simply block (not race!) until the thread is done. In fact, I > generally think the right answer to ESRCH would be to die, as to me this > indicates some kind of memory corruption or other severe invariant > violation.
Your assumption though doesn't match my experience. Just calling pthread_join() again after a delay of 10msec did the job. (I placed the loop to give it a few more tries if needed.) > Now, given that you mentioned changes related to popen()-logic in your > own code, I wonder if the change in your _application_ logic related to > fork() may be interoperating badly with threads. In particular, after > you fork(), all of the "other" threads will be gone, so if you fork() > and then continue any MHD-interaction related to the threads spawned by > MHD, that is likely to be, eh, problematic --- and may show up with an > ESRCH. However, that doesn't quite explain to me why putting this in a > loop with sleeps might fix it. (But I don't know enough about your code.) The code in use is https://github.com/famzah/popen-noshell , with a small wrapper to really make it look like popen()/pclose(), which simply puts the neede clean-up struct into a hash table with the fp as its key. pclose() then uses the fp to recover the clean-up struct pointer to be supplied to the simplified pclose() variant. > Regardless, the loop/sleep is a very, very wrong fix, and I strongly > suspect the problem is in your code (or how you use the MHD API, in > conjunction with fork()). You're completely right here. I wrote some more small test programs, and I observe two things: * pthread_join() indeed waits as promised in my tests; and * I cannot reproduce that non-waiting / failing behavior with any of my test attempts so far. I did, however, find _one_ similar bug report, where pthread_join() failed in a similar way: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/1427981 Unfortunately, the provided testcase is incorrect (see my comment there) and and this bug report wasn't ever finished, so I don't know if the bad testcase exists similarly in their original application. However, since libmicrohttpd just ignores the thread's result (passing a NULL pointer to pthread_join(), in conjunction with my observation that it will work on a second try), I'm quite confident that something different is happening here. Looping until success (limited to a small justificable timespan) isn't a correct fix of course. And indeed, pthread_join() probably should wait, so I'm off again trying to find out in which situations this couldn't happen. MfG, JBG -- Getslash GmbH, Hermann-Johenning-Platz 2, 59302 Oelde Tel: +49-2522-834349-5 Fax: +49-2522-834349-1 http://www.getslash.de Mobil: +49-152-33822499 Sitz der Gesellschaft: Oelde Handelsregister: Amtsgericht Münster, HRB 11911 Ust-Id-Nr.: DE 815060326 Geschäftsführung: Andre Peitz, Tobias Hanisch
Description: PGP signature