Hi , instead of pulling 70K tables from mysql into hdfs. take dump of all 30 table and put in to hBase data base .
if you pulled 70K tables from mysql into hdfs , you need to use Hive , but modification will not possible in Hive :( *@ common-user :* please correct me , if i am wrong . Kind Regards Sujit Dhamale (+91 9970086652) On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > Maybe you can do some VIEWs or unions or merge tables on the mysql > side to overcome the aspect of launching so many sqoop jobs. > > On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani > <hivehadooplearn...@gmail.com> wrote: > > All, > > > > We are trying to implement sqoop in our environment which has 30 mysql > > sharded databases and all the databases have around 30 databases with > > 150 tables in each of the database which are all sharded (horizontally > > sharded that means the data is divided into all the tables in mysql). > > > > The problem is that we have a total of around 70K tables which needed > > to be pulled from mysql into hdfs. > > > > So, my question is that generating 70K sqoop commands and running them > > parallel is feasible or not? > > > > Also, doing incremental updates is going to be like invoking 70K > > another sqoop jobs which intern kick of map-reduce jobs. > > > > The main problem is monitoring and managing this huge number of jobs? > > > > Can anyone suggest me the best way of doing it or is sqoop a good > > candidate for this type of scenario? > > > > Currently the same process is done by generating tsv files mysql > > server and dumped into staging server and from there we'll generate > > hdfs put statements.. > > > > Appreciate your suggestions !!! > > > > > > Thanks, > > Srinivas Surasani >