Re: Hadoop with Sharded MySql

Sujit Dhamale Thu, 31 May 2012 21:53:08 -0700

Hi ,
instead of pulling 70K tables from mysql into hdfs.
take dump of all 30 table and put in to hBase data base .


if you pulled 70K tables from mysql into hdfs , you need to use Hive , but
modification will not possible in Hive :(

*@ common-user :* please correct me , if i am wrong .

Kind Regards
Sujit Dhamale
(+91 9970086652)
On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> Maybe you can do some VIEWs or unions or merge tables on the mysql
> side to overcome the aspect of launching so many sqoop jobs.
>
> On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
> <hivehadooplearn...@gmail.com> wrote:
> > All,
> >
> > We are trying to implement sqoop in our environment which has 30 mysql
> > sharded databases and all the databases have around 30 databases with
> > 150 tables in each of the database which are all sharded (horizontally
> > sharded that means the data is divided into all the tables in mysql).
> >
> > The problem is that we have a total of around 70K tables which needed
> > to be pulled from mysql into hdfs.
> >
> > So, my question is that generating 70K sqoop commands and running them
> > parallel is feasible or not?
> >
> > Also, doing incremental updates is going to be like invoking 70K
> > another sqoop jobs which intern kick of map-reduce jobs.
> >
> > The main problem is monitoring and managing this huge number of jobs?
> >
> > Can anyone suggest me the best way of doing it or is sqoop a good
> > candidate for this type of scenario?
> >
> > Currently the same process is done by generating tsv files  mysql
> > server and dumped into staging server and  from there we'll generate
> > hdfs put statements..
> >
> > Appreciate your suggestions !!!
> >
> >
> > Thanks,
> > Srinivas Surasani
>

Re: Hadoop with Sharded MySql

Reply via email to