Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?

Marcos Ortiz Thu, 15 Mar 2012 06:34:24 -0700


On 03/15/2012 09:22 AM, Manu S wrote:

Thanks a lot Bijoy, that makes sense :)
Suppose if I have Mysql database in some other node(not in hadoopcluster), can I import the tables using sqoop to my HDFS?

Yes, this is the main purpose of Sqoop
On the Cloudera site, you have the completed documentation for it

Sqoop User Guide
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html

Sqoop installation
https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation

Sqoop for MySQL
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql

Sqoop site on GitHub
http://github.com/cloudera/sqoop

Cloudera blog related post to Sqoop
http://www.cloudera.com/blog/category/sqoop/


Best wishes

On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks <bejoy.had...@gmail.com<mailto:bejoy.had...@gmail.com>> wrote:


    Hi Manu
         Please find my responses inline

    >I had read about we can install Pig, hive & Sqoop on the client
    node, no
    need to install it in cluster. What is the client node actually?
    Can I use
    my management-node as a client?

    On larger clusters we have different node that is out of hadoop
    cluster and
    these stay in there. So user programs would be triggered from this
    node.
    This is the node refereed to as client node/ edge node etc . For your
    cluster management node and client node can be the same

    >What is the best practice to install Pig, Hive, & Sqoop?

    On a client node

    >For the fully distributed cluster do we need to install Pig,
    Hive, & Sqoop
    >in each nodes?

    No, can be on a client node or on any of the nodes

    >Mysql is needed for Hive as a metastore and sqoop can import
    mysql database
    to HDFS or hive or pig, so can we make use of mysql DB's residing on
    another node?
    Regarding your first point, SQOOP import is for different purpose,
    to get
    data from RDBNS into hdfs. But the meta stores is used by hive  in
    framing
    the map reduce jobs corresponding to your hive query. Here SQOOP
    can't help
    you much
    Recommend to have the metastore db of hive on the same node where
    hive is
    installed as for execution hive queries there is meta data look up
    required
    much especially when your table has large number of partitions and
    all.

    Regards
    Bejoy.K.S

    On Thu, Mar 15, 2012 at 5:34 PM, Manu S <manupk...@gmail.com
    <mailto:manupk...@gmail.com>> wrote:

    > Greetings All !!!
    >
    > I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes,
    in which 5
    > are used for a fully distributed cluster, 1 for
    pseudo-distributed & 1 as
    > management-node.
    >
    > Fully distributed cluster: HDFS, Mapreduce & Hbase cluster
    > Pseudo distributed mode: All
    >
    > I had read about we can install Pig, hive & Sqoop on the client
    node, no
    > need to install it in cluster. What is the client node actually?
    Can I use
    > my management-node as a client?
    >
    > What is the best practice to install Pig, Hive, & Sqoop?
    > For the fully distributed cluster do we need to install Pig,
    Hive, & Sqoop
    > in each nodes?
    >
    > Mysql is needed for Hive as a metastore and sqoop can import
    mysql database
    > to HDFS or hive or pig, so can we make use of mysql DB's residing on
    > another node?
    >
    > --
    > Thanks & Regards
    > ----
    > Manu S
    > SI Engineer - OpenSource & HPC
    > Wipro Infotech
    > Mob: +91 8861302855                Skype: manuspkd
    > www.opensourcetalk.co.in <http://www.opensourcetalk.co.in>
    >




--
Thanks & Regards
----
Manu S
SI Engineer - OpenSource & HPC
Wipro Infotech
Mob: +91 8861302855                Skype: manuspkd
www.opensourcetalk.co.in <http://www.opensourcetalk.co.in>


--
Marcos Luis Ortíz Valmaseda
 Sr. Software Engineer (UCI)
 http://marcosluis2186.posterous.com
 http://postgresql.uci.cu/blog/38



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?

Reply via email to