I wrote: >> Btw, keep in mind that there are memory-related bugs that don't show up >> until there's something big in memory that pushes the code in question >> up into a region with different data patterns in it (most frequently zero >> vs. non-zero, but others are possible). IOW, maybe the code is dependent >> on uninitialized memory, but you were getting lucky when you ran it outside >> of Hadoop. Have you run it through valgrind or Purify or similar?
Keith Wiley wrote: > Valgrind has turned out to be almost useless. It can't "reach" > through the JVM through JNI to the .so code. If I don't > tell valgrind to following children, it obviously produces > no relevant output, but if I do tell it to follow children, > it can't successfully launch a VM to run Java in: > Error occurred during initialization of VM > Unknown x64 processor: SSE2 not supported > Sigh...any thoughts on running Valgrind on Hadoop->JVM->JNI->native code? I actually meant something simpler: if we posit that the bug is actually in the library code but isn't always triggering a segfault due to random memory conditions (i.e., "getting lucky"), then running valgrind on it in a non-Java context (i.e., what you said "runs perfectly fine outside Hadoop") should detect such bug(s). If that shows nothing, and you're not passing buffers across the JNI boundary (=> possible GC issues, perhaps subtle ones?), then I'm out of ideas. Again. Sorry. :-/ Greg
