samskalicky commented on issue #13438: libc getenv is not threadsafe URL: https://github.com/apache/incubator-mxnet/issues/13438#issuecomment-444592297 Heres the relevant code from the version of libc (2.17) that was being used in the failure: getenv.c: https://pastebin.com/gj4XZgdE setenv.c: https://pastebin.com/NuTfDdDp It looks like the issue is the __environ variable being iterated over in lines 15, 32, 55. This variable is defined in setenv.c: ``` #if !_LIBC # define __environ environ # ifndef HAVE_ENVIRON_DECL extern char **environ; # endif #endif ``` So there is this nasty global pointer to the environment. While setenv has a lock around changing the environment, theres no lock for getenv so (as mentioned in the block post linked in the description) : > "this is a char**, so it's supposed to have a bunch of pointers to char* buffers, one for each var=val pairing. What if that region has been reused and now contains a bogus pointer into the weeds? Yep, segfault." So I think @KellenSunderland's suggestion of adding a lock/mutex in dmls parameter.h's GetEnv and SetEnv should prevent any failure due to MXNet's various threads accessing the environment simultaneously. Although without a reproducible test case we're unable to validate this. Anybody have any ideas for reproducing something similar to this code from the blog post with mxnet's engine tests so that we have a reproducible test case? ``` #include <stdio.h> #include <stdlib.h> #include <pthread.h> static void* worker(void* arg) { for (;;) { int i; char var[256], *p = var; for (i = 0; i < 8; ++i) { *p++ = 65 + (random() % 26); } *p++ = '\0'; setenv(var, "test", 1); } return NULL; } int main() { pthread_t t; printf("start\n"); setenv("foo", "bar", 0); pthread_create(&t, NULL, worker, 0); for (;;) { getenv("foo"); } return 0; } ``` I tried simplifying the start_stop test https://github.com/apache/incubator-mxnet/blob/master/tests/cpp/engine/threaded_engine_test.cc#L124 for just the threaded_engine_perdevice and stop/started it a few hundred times and it didnt cause a segfault.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
