Dear all,

first of all the reason for this is that we have a lot of data in Cloudera but want to test BigSheets (from BigInsights) and Datameer using the same HDFS - Source (instead of reimporting).

Thanks a lot for your suggestion. I finally got it working, here is the steps I have done:

1. install Cloudera
2. stop all Cloudera-Services (flume-master, flume-node, hadoop-0.20-datanode, hadoop-0.20-jobtracker, hadoop-0.20-namenode, hadoop-0.20-secondarynamenode, hadoop-0.20-tasktracker, hadoop-hbase-master, hadoop-zookeeper-server, hue, oozie, sqoop-metastore) and delete in rc3/5.d (except hadoop-0.20-datanode, hadoop-0.20-jobtracker, hadoop-0.20-namenode, hadoop-0.20-secondarynamenode)
3. install BigInsights
4. start BigInsights via /opt/ibm/biginsights/bin/start-all.sh (as user biadmin) 5. Find out all PID's für namenode, secondarynamenode and datanode and kill -9
6. cd /etc/init.d
7. ./hadoop-0.20-datanode start && sleep 5 && ./hadoop-0.20-namenode start && ./hadoop-0.20-secondarynamenode start
8. su - biadmin
9. in /opt/ibm/biginsights/hadoop-conf/hadoop-env.sh "export JAVA_HOME=$BIGINSIGHTS_HOME/jdk" replace with "export JAVA_HOME=/usr/java/jdk1.6.0_21/jre" 10. copy the cloudera jar to /opt/ibm/biginsights/IHC/hadoop-0.20.2-core.jar and /opt/ibm/biginsights/hdm/IHC/hadoop-0.20.2-core.jar but add PatchedDistributedFileSystem.class to it before 11. in /opt/ibm/biginsights/hadoop-conf/core-site.xml hdfs://localhost.localdomain:9000 replace with hdfs://localhost.localdomain:8020

Now a /opt/ibm/biginsights/IHC/bin/hadoop fs -ls goes to Cloudera HDFS.

Best Regards,

Romeo

On 01/25/2012 02:19 PM, Michael Segel wrote:
Alex,
I said I would be nice and hold my tongue when it comes to IBM and their IM 
pillar products... :-)

You could write a client that talks to two  different hadoop versions but then 
you would be using hftp which is what you have under the hood in distcp...

But that doesn't seem to be what he wants to do... I can only imagine why he is 
asking this question... ;-)

Sent from my iPhone

On Jan 25, 2012, at 7:32 AM, "alo alt"<[email protected]>  wrote:

Insight is a IBM related product, based on an fork of hadoop I think. The 
mixing of totally different stacks make no sense. And will not work, I guess.

- Alex

--
Alexander Lorenz
http://mapredit.blogspot.com

On Jan 25, 2012, at 1:12 PM, Harsh J wrote:

Hello Romeo,

Inline…

On Wed, Jan 25, 2012 at 4:07 PM, Romeo Kienzler<[email protected]>  wrote:
Dear List,

we're trying to use a central HDFS storage in order to be accessed from
various other Hadoop-Distributions.
The HDFS you've setup, what 'distribution' is that from? You will have
to use that particular version's jar across all client applications
you use, else you'll run into RPC version incompatibilities.

Do you think this is possible? We're having trouble, but not related to
different RPC-Versions.
It should be possible _most of the times_ by replacing jars at the
client end to use the one that runs your cluster, but there may be
minor API incompatibilities between certain versions that can get in
the way. Purely depends on your client application and its
implementation. If it sticks to using the publicly supported APIs, you
are mostly fine.

When trying to access a Cloudera CDH3 Update 2 (cdh3u2) HDFS from
BigInsights 1.3 we're getting this error:
BigInsights runs off IBM's own patched Hadoop sources if I am right,
and things can get a bit tricky there. See the following points:

Bad connection to FS. Command aborted. Exception: Call to
localhost.localdomain/127.0.0.1:50070 failed on local exception:
java.io.EOFException
java.io.IOException: Call to localhost.localdomain/127.0.0.1:50070 failed on
local exception: java.io.EOFException
This is surely an RPC issue. The call tries to read off a field, but
gets no response, EOFs and dies. We have more descriptive error
messages with the 0.23 version onwards, but the problem here is that
your IBM client jar is not the same as your cluster's jar. The mixture
won't work.

com.ibm.biginsights.hadoop.patch.PatchedDistributedFileSystem.initialize(PatchedDistributedFileSystem.java:19)
^^ This is what am speaking of. Your client (BigInsights? Have not
used it really…) is using an IBM jar with their supplied
'PatchDistributedFileSystem', and that is probably incompatible with
the cluster's HDFS RPC protocols. I do not know enough about IBM's
custom stuff to know for sure it would work if you replace it with
your clusters' jar.

But we've already replaced the client hadoop-common.jar's with the Cloudera
ones.
Apparently not. Your strace shows that com.ibm.* classes are still
being pulled. My guess is that BigInsights would not work with
anything non IBM, but I have not used it to know for sure.

If they have a user community, you can ask there if there is a working
way to have BigInsights run against Apache/CDH/etc. distributions.
For CDH specific questions, you may ask at
https://groups.google.com/a/cloudera.org/group/cdh-user/topics instead
of the Apache lists here.

--
Harsh J
Customer Ops. Engineer, Cloudera

Reply via email to