Thanks a lot Bijoy, that makes sense :) Suppose if I have Mysql database in some other node(not in hadoop cluster), can I import the tables using sqoop to my HDFS?
On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks <bejoy.had...@gmail.com> wrote: > Hi Manu > Please find my responses inline > > >I had read about we can install Pig, hive & Sqoop on the client node, no > need to install it in cluster. What is the client node actually? Can I use > my management-node as a client? > > On larger clusters we have different node that is out of hadoop cluster and > these stay in there. So user programs would be triggered from this node. > This is the node refereed to as client node/ edge node etc . For your > cluster management node and client node can be the same > > >What is the best practice to install Pig, Hive, & Sqoop? > > On a client node > > >For the fully distributed cluster do we need to install Pig, Hive, & Sqoop > >in each nodes? > > No, can be on a client node or on any of the nodes > > >Mysql is needed for Hive as a metastore and sqoop can import mysql > database > to HDFS or hive or pig, so can we make use of mysql DB's residing on > another node? > Regarding your first point, SQOOP import is for different purpose, to get > data from RDBNS into hdfs. But the meta stores is used by hive in framing > the map reduce jobs corresponding to your hive query. Here SQOOP can't help > you much > Recommend to have the metastore db of hive on the same node where hive is > installed as for execution hive queries there is meta data look up required > much especially when your table has large number of partitions and all. > > Regards > Bejoy.K.S > > On Thu, Mar 15, 2012 at 5:34 PM, Manu S <manupk...@gmail.com> wrote: > > > Greetings All !!! > > > > I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in > which 5 > > are used for a fully distributed cluster, 1 for pseudo-distributed & 1 as > > management-node. > > > > Fully distributed cluster: HDFS, Mapreduce & Hbase cluster > > Pseudo distributed mode: All > > > > I had read about we can install Pig, hive & Sqoop on the client node, no > > need to install it in cluster. What is the client node actually? Can I > use > > my management-node as a client? > > > > What is the best practice to install Pig, Hive, & Sqoop? > > For the fully distributed cluster do we need to install Pig, Hive, & > Sqoop > > in each nodes? > > > > Mysql is needed for Hive as a metastore and sqoop can import mysql > database > > to HDFS or hive or pig, so can we make use of mysql DB's residing on > > another node? > > > > -- > > Thanks & Regards > > ---- > > Manu S > > SI Engineer - OpenSource & HPC > > Wipro Infotech > > Mob: +91 8861302855 Skype: manuspkd > > www.opensourcetalk.co.in > > > -- Thanks & Regards ---- Manu S SI Engineer - OpenSource & HPC Wipro Infotech Mob: +91 8861302855 Skype: manuspkd www.opensourcetalk.co.in