I am seeing very perplexing segfaults and standard allocation exceptions in my 
native code (.so files passed to the distributed cace) which is called via JNI 
from the map task.  This code runs perfectly fine (on the same data) outside 
Hadoop.  Even when run in a Hadoop standalone mode (no cluster), it still 
segfaults.  The memory footprint is quite small and inspection at run time 
reveals there is plenty of memory left, yet I get segfaults and exceptions.

I'm starting to wonder if this is a thread issue.

The native code is not *specifically* thread safe (not compiled with pthreads 
or anything like that).

However, it is also not run in any concurrent fashion except w.r.t. to the JVM 
itself.  For example, my map task doesn't make parallel calls through JNI to 
the native code on concurrent threads at the Java level, nor does the native 
code itself spawn any threads (like I said, it isn't even compiled with 
pthreads).

However, there are clearly other "threads" of execution.  For example, the JVM 
itself is running, including whatever supplemental threads the JVM involves 
(the garbage collector?).  In addition, my Java mapper is running two Java 
threads at the time of the native code.  One calls the native code and 
effectively blocks until the native code returns through JNI.  The other just 
spins and sends reports and statuses to the job tracker at regular intervals to 
prevent the task from being killed, but it doesn't do anything else 
particularly memory-related, certainly no JNI/native calls, it's very basic, 
just sleep 'n report, sleep 'n report.

So, the question is, in the scenario I have described, is there any reason to 
suspect that the cause of my problems is some sort of thread trampling between 
the native code and something else in the surrounding environment (the JVM or 
something like that), especially in the context of the surrounding Hadoop 
infrastructure?  It doesn't really make any sense to me, but I'm running out of 
ideas.

I've experimented with "mapred.child.java.opts" and "mapred.child.ulimit" but 
nothing really seems to have any effect on the frequency of these errors.

I'm quite out of ideas.  These segfaults and standard allocation exceptions (in 
the face of plenty of free memory) have basically brought my work to a halt and 
I just don't know what to do anymore.

Thanks.

________________________________________________________________________________
Keith Wiley               [email protected]               www.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
  -- Homer Simpson
________________________________________________________________________________



Reply via email to