On 03/15/2012 09:22 AM, Manu S wrote:
Thanks a lot Bijoy, that makes sense :)
Suppose if I have Mysql database in some other node(not in hadoop
cluster), can I import the tables using sqoop to my HDFS?
Yes, this is the main purpose of Sqoop
On the Cloudera site, you have the completed documentation for it
Sqoop User Guide
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html
Sqoop installation
https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation
Sqoop for MySQL
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql
Sqoop site on GitHub
http://github.com/cloudera/sqoop
Cloudera blog related post to Sqoop
http://www.cloudera.com/blog/category/sqoop/
Best wishes
On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks <bejoy.had...@gmail.com
<mailto:bejoy.had...@gmail.com>> wrote:
Hi Manu
Please find my responses inline
>I had read about we can install Pig, hive & Sqoop on the client
node, no
need to install it in cluster. What is the client node actually?
Can I use
my management-node as a client?
On larger clusters we have different node that is out of hadoop
cluster and
these stay in there. So user programs would be triggered from this
node.
This is the node refereed to as client node/ edge node etc . For your
cluster management node and client node can be the same
>What is the best practice to install Pig, Hive, & Sqoop?
On a client node
>For the fully distributed cluster do we need to install Pig,
Hive, & Sqoop
>in each nodes?
No, can be on a client node or on any of the nodes
>Mysql is needed for Hive as a metastore and sqoop can import
mysql database
to HDFS or hive or pig, so can we make use of mysql DB's residing on
another node?
Regarding your first point, SQOOP import is for different purpose,
to get
data from RDBNS into hdfs. But the meta stores is used by hive in
framing
the map reduce jobs corresponding to your hive query. Here SQOOP
can't help
you much
Recommend to have the metastore db of hive on the same node where
hive is
installed as for execution hive queries there is meta data look up
required
much especially when your table has large number of partitions and
all.
Regards
Bejoy.K.S
On Thu, Mar 15, 2012 at 5:34 PM, Manu S <manupk...@gmail.com
<mailto:manupk...@gmail.com>> wrote:
> Greetings All !!!
>
> I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes,
in which 5
> are used for a fully distributed cluster, 1 for
pseudo-distributed & 1 as
> management-node.
>
> Fully distributed cluster: HDFS, Mapreduce & Hbase cluster
> Pseudo distributed mode: All
>
> I had read about we can install Pig, hive & Sqoop on the client
node, no
> need to install it in cluster. What is the client node actually?
Can I use
> my management-node as a client?
>
> What is the best practice to install Pig, Hive, & Sqoop?
> For the fully distributed cluster do we need to install Pig,
Hive, & Sqoop
> in each nodes?
>
> Mysql is needed for Hive as a metastore and sqoop can import
mysql database
> to HDFS or hive or pig, so can we make use of mysql DB's residing on
> another node?
>
> --
> Thanks & Regards
> ----
> Manu S
> SI Engineer - OpenSource & HPC
> Wipro Infotech
> Mob: +91 8861302855 Skype: manuspkd
> www.opensourcetalk.co.in <http://www.opensourcetalk.co.in>
>
--
Thanks & Regards
----
Manu S
SI Engineer - OpenSource & HPC
Wipro Infotech
Mob: +91 8861302855 Skype: manuspkd
www.opensourcetalk.co.in <http://www.opensourcetalk.co.in>
--
Marcos Luis OrtÃz Valmaseda
Sr. Software Engineer (UCI)
http://marcosluis2186.posterous.com
http://postgresql.uci.cu/blog/38
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci