Hi Sujith, Srinivas is asking how to import data into HDFS using sqoop? I believe he must have thought out well before designing the entire architecture/solution. He has not specified whether he would like to modify the data or not. Whether to use HIve or HBase is a different question altogether and depends on his use-case.
Thanks, Anil On Thu, May 31, 2012 at 9:52 PM, Sujit Dhamale <sujitdhamal...@gmail.com>wrote: > Hi , > instead of pulling 70K tables from mysql into hdfs. > take dump of all 30 table and put in to hBase data base . > > if you pulled 70K tables from mysql into hdfs , you need to use Hive , but > modification will not possible in Hive :( > > *@ common-user :* please correct me , if i am wrong . > > Kind Regards > Sujit Dhamale > (+91 9970086652) > On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <edlinuxg...@gmail.com > >wrote: > > > Maybe you can do some VIEWs or unions or merge tables on the mysql > > side to overcome the aspect of launching so many sqoop jobs. > > > > On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani > > <hivehadooplearn...@gmail.com> wrote: > > > All, > > > > > > We are trying to implement sqoop in our environment which has 30 mysql > > > sharded databases and all the databases have around 30 databases with > > > 150 tables in each of the database which are all sharded (horizontally > > > sharded that means the data is divided into all the tables in mysql). > > > > > > The problem is that we have a total of around 70K tables which needed > > > to be pulled from mysql into hdfs. > > > > > > So, my question is that generating 70K sqoop commands and running them > > > parallel is feasible or not? > > > > > > Also, doing incremental updates is going to be like invoking 70K > > > another sqoop jobs which intern kick of map-reduce jobs. > > > > > > The main problem is monitoring and managing this huge number of jobs? > > > > > > Can anyone suggest me the best way of doing it or is sqoop a good > > > candidate for this type of scenario? > > > > > > Currently the same process is done by generating tsv files mysql > > > server and dumped into staging server and from there we'll generate > > > hdfs put statements.. > > > > > > Appreciate your suggestions !!! > > > > > > > > > Thanks, > > > Srinivas Surasani > > > -- Thanks & Regards, Anil Gupta