Hello Romeo, Inline…
On Wed, Jan 25, 2012 at 4:07 PM, Romeo Kienzler <[email protected]> wrote: > Dear List, > > we're trying to use a central HDFS storage in order to be accessed from > various other Hadoop-Distributions. The HDFS you've setup, what 'distribution' is that from? You will have to use that particular version's jar across all client applications you use, else you'll run into RPC version incompatibilities. > Do you think this is possible? We're having trouble, but not related to > different RPC-Versions. It should be possible _most of the times_ by replacing jars at the client end to use the one that runs your cluster, but there may be minor API incompatibilities between certain versions that can get in the way. Purely depends on your client application and its implementation. If it sticks to using the publicly supported APIs, you are mostly fine. > When trying to access a Cloudera CDH3 Update 2 (cdh3u2) HDFS from > BigInsights 1.3 we're getting this error: BigInsights runs off IBM's own patched Hadoop sources if I am right, and things can get a bit tricky there. See the following points: > Bad connection to FS. Command aborted. Exception: Call to > localhost.localdomain/127.0.0.1:50070 failed on local exception: > java.io.EOFException > java.io.IOException: Call to localhost.localdomain/127.0.0.1:50070 failed on > local exception: java.io.EOFException This is surely an RPC issue. The call tries to read off a field, but gets no response, EOFs and dies. We have more descriptive error messages with the 0.23 version onwards, but the problem here is that your IBM client jar is not the same as your cluster's jar. The mixture won't work. > com.ibm.biginsights.hadoop.patch.PatchedDistributedFileSystem.initialize(PatchedDistributedFileSystem.java:19) ^^ This is what am speaking of. Your client (BigInsights? Have not used it really…) is using an IBM jar with their supplied 'PatchDistributedFileSystem', and that is probably incompatible with the cluster's HDFS RPC protocols. I do not know enough about IBM's custom stuff to know for sure it would work if you replace it with your clusters' jar. > But we've already replaced the client hadoop-common.jar's with the Cloudera > ones. Apparently not. Your strace shows that com.ibm.* classes are still being pulled. My guess is that BigInsights would not work with anything non IBM, but I have not used it to know for sure. If they have a user community, you can ask there if there is a working way to have BigInsights run against Apache/CDH/etc. distributions. For CDH specific questions, you may ask at https://groups.google.com/a/cloudera.org/group/cdh-user/topics instead of the Apache lists here. -- Harsh J Customer Ops. Engineer, Cloudera
