[
https://issues.apache.org/jira/browse/HADOOP-11127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950177#comment-14950177
]
Alan Burlison commented on HADOOP-11127:
----------------------------------------
I agree that #3 might be technically possible but, as you say, it requires
wholesale changes to the Hadoop distro build and release process and I'm simply
not in a position to be able to do that. The problem I'm trying to fix here is
the JAR/JNI API mismatch one. Yes that's a necessary prerequisite for building
portable Hadoop distros that include the Native components for each supported
platform, but it's not the immediate problem I'm trying to solve.
The original scenario was where a YARN instance was being asked to run
submissions and the JNI library version installed on the server was different
from the Hadoop version of the job submitted by the client. Whilst you *could*
require the client to provide matching JNI files it would have to do so for
every architecture the job might conceivably run on. Alternatively you could
just make sure the server had the correct JNI libraries installed for all the
versions of Hadoop it was prepared to support. The effect would be the same,
but I suspect setting up the server so it has all the JNI versions it needed
will be easier, at least in the short term.
I've had a look at snappy and it does roughly what I described in an earlier
comment - it extracts the appropriate JNI file from the JAR to the filesystem,
reads it back to make sure it's the same as the version in the JAR file, makes
it executable and then loads it. It has to generate an unique filename for the
extracted file (usually under /tmp) and it has to be sure to remove it when the
JVM exits. That's messy, and if there are JVM aborts, over time is going to
clutter up /tmp. Some sort of persistent, shared location with appropriate file
locking etc would seem to fit the Hadoop model better, but is obviously more
complicated to implement.
The versioning question is a good one, and really only depends on when you'd
want to allow incompatible changes to the Java/JNI interface. If you want to
allow them in point releases then yes, you need a 3-part version, for minor
releases a 2-part version etc. As the Java/JNI interface is not a published
API, I think allowing changes even in point releases is probably best - you
might need to do that to fix bugs. I'll therefore update the proposal to use a
3-part version.
> Improve versioning and compatibility support in native library for downstream
> hadoop-common users.
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-11127
> URL: https://issues.apache.org/jira/browse/HADOOP-11127
> Project: Hadoop Common
> Issue Type: Bug
> Components: native
> Reporter: Chris Nauroth
> Assignee: Alan Burlison
> Attachments: HADOOP-11064.003.patch, proposal.txt
>
>
> There is no compatibility policy enforced on the JNI function signatures
> implemented in the native library. This library typically is deployed to all
> nodes in a cluster, built from a specific source code version. However,
> downstream applications that want to run in that cluster might choose to
> bundle a hadoop-common jar at a different version. Since there is no
> compatibility policy, this can cause link errors at runtime when the native
> function signatures expected by hadoop-common.jar do not exist in
> libhadoop.so/hadoop.dll.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)