All, I'm trying to get data into HDFS directly from sharded database and expose to existing hive infrastructure.
( we are currently doing this way,, mysql->staging server->hdfs put commands->hdfs, which is taking lot of time ). If we have way of running single sqoop job across all shardes for single table, I believe it makes life easier in terms of monotoring and exception handlings.. Thanks, Srinivas On Fri, Jun 1, 2012 at 1:27 AM, anil gupta <anilgupt...@gmail.com> wrote: > Hi Sujith, > > Srinivas is asking how to import data into HDFS using sqoop? I believe he > must have thought out well before designing the entire > architecture/solution. He has not specified whether he would like to modify > the data or not. Whether to use HIve or HBase is a different question > altogether and depends on his use-case. > > Thanks, > Anil > > > On Thu, May 31, 2012 at 9:52 PM, Sujit Dhamale <sujitdhamal...@gmail.com > >wrote: > > > Hi , > > instead of pulling 70K tables from mysql into hdfs. > > take dump of all 30 table and put in to hBase data base . > > > > if you pulled 70K tables from mysql into hdfs , you need to use Hive , > but > > modification will not possible in Hive :( > > > > *@ common-user :* please correct me , if i am wrong . > > > > Kind Regards > > Sujit Dhamale > > (+91 9970086652) > > On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <edlinuxg...@gmail.com > > >wrote: > > > > > Maybe you can do some VIEWs or unions or merge tables on the mysql > > > side to overcome the aspect of launching so many sqoop jobs. > > > > > > On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani > > > <hivehadooplearn...@gmail.com> wrote: > > > > All, > > > > > > > > We are trying to implement sqoop in our environment which has 30 > mysql > > > > sharded databases and all the databases have around 30 databases with > > > > 150 tables in each of the database which are all sharded > (horizontally > > > > sharded that means the data is divided into all the tables in mysql). > > > > > > > > The problem is that we have a total of around 70K tables which needed > > > > to be pulled from mysql into hdfs. > > > > > > > > So, my question is that generating 70K sqoop commands and running > them > > > > parallel is feasible or not? > > > > > > > > Also, doing incremental updates is going to be like invoking 70K > > > > another sqoop jobs which intern kick of map-reduce jobs. > > > > > > > > The main problem is monitoring and managing this huge number of jobs? > > > > > > > > Can anyone suggest me the best way of doing it or is sqoop a good > > > > candidate for this type of scenario? > > > > > > > > Currently the same process is done by generating tsv files mysql > > > > server and dumped into staging server and from there we'll generate > > > > hdfs put statements.. > > > > > > > > Appreciate your suggestions !!! > > > > > > > > > > > > Thanks, > > > > Srinivas Surasani > > > > > > > > > -- > Thanks & Regards, > Anil Gupta > -- Regards, -- Srinivas srini...@cloudwick.com