samskalicky commented on issue #13438: libc getenv is not threadsafe
URL: 
https://github.com/apache/incubator-mxnet/issues/13438#issuecomment-444592297
 
 
   Heres the relevant code from the version of libc (2.17) that was being used 
in the failure: 
   getenv.c: https://pastebin.com/gj4XZgdE 
   setenv.c: https://pastebin.com/NuTfDdDp
   
   It looks like the issue is the __environ variable being iterated over in 
lines 15, 32, 55. This variable is defined in setenv.c:
   
   ```
   #if !_LIBC
   # define __environ      environ
   # ifndef HAVE_ENVIRON_DECL
   extern char **environ;
   # endif
   #endif
   ```
   
   So there is this nasty global pointer to the environment. While setenv has a 
lock around changing the environment, theres no lock for getenv so (as 
mentioned in the block post linked in the description) :
   
   > "this is a char**, so it's supposed to have a bunch of pointers to char* 
buffers, one for each var=val pairing. What if that region has been reused and 
now contains a bogus pointer into the weeds? Yep, segfault."
   
   So I think @KellenSunderland's suggestion of adding a lock/mutex in dmls 
parameter.h's GetEnv and SetEnv should prevent any failure due to MXNet's 
various threads accessing the environment simultaneously. Although without a 
reproducible test case we're unable to validate this. 
   
   Anybody have any ideas for reproducing something similar to this code from 
the blog post with mxnet's engine tests so that we have a reproducible test 
case? 
   
   ```
   #include <stdio.h>
   #include <stdlib.h>
   #include <pthread.h>
    
   static void* worker(void* arg) {
     for (;;) {
       int i;
       char var[256], *p = var;
    
       for (i = 0; i < 8; ++i) {
         *p++ = 65 + (random() % 26);
       }
    
       *p++ = '\0';
    
       setenv(var, "test", 1);
     }
    
     return NULL;
   }
    
   int main() {
     pthread_t t;
    
     printf("start\n");
     setenv("foo", "bar", 0);
     pthread_create(&t, NULL, worker, 0);
    
     for (;;) {
       getenv("foo");
     }
    
     return 0;
   }
   ```
   
   I tried simplifying the start_stop test 
https://github.com/apache/incubator-mxnet/blob/master/tests/cpp/engine/threaded_engine_test.cc#L124
 for just the threaded_engine_perdevice and stop/started it a few hundred times 
and it didnt cause a segfault.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to