Hi, I have a distributed file server front end to Hadoop that uses the libhdfs C API to talk to Hadoop. Normally the file server will fork on a new client connection but this does not work with the libhdfs shared library (it is loaded using dlopen). If the server is in single process mode (no forking and can handle only one client at a time) then everything works fine.
I have tried changing it so the server disconnects the Hadoop connection before forking and having both processes re-connect post fork. Essentially in the server: hdfsDisconnect(...); pid = fork(); hdfsConnect(...); if (pid == 0) ... else ... This causes a hang in the child process on Connect with the following backtrace: (gdb) bt #0 0x00000034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002ace492559f7 in os::PlatformEvent::park () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #2 0x00002ace4930a5da in ObjectMonitor::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #3 0x00002ace49307b13 in ObjectSynchronizer::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #4 0x00002ace490cf5fb in JVM_MonitorWait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #5 0x00002ace49c87f50 in ?? () #6 0x0000000000000001 in ?? () #7 0x00002ace4cd84d10 in ?? () #8 0x000000003f800000 in ?? () #9 0x00002ace49c8841d in ?? () #10 0x00007fff0b4d04c0 in ?? () #11 0x0000000000000000 in ?? () Leaving the connection open in the server: pid = fork(); if (pid == 0) ... else ... Also produces a hang in the child: (gdb) bt #0 0x00000034d160ad09 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b3d7193d9f7 in os::PlatformEvent::park () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #2 0x00002b3d719f25da in ObjectMonitor::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #3 0x00002b3d719efb13 in ObjectSynchronizer::wait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #4 0x00002b3d717b75fb in JVM_MonitorWait () from /afs/nd.edu/user37/ccl/software/external/java/jdk/jre/lib/amd64/server/libjvm.so #5 0x00002b3d7236ff50 in ?? () #6 0x0000000000000000 in ?? () Does anyone have a suggestion on debugging/fixing this? Thanks for any help, -- - Patrick Donnelly
