Re: A beginner question

Masood Mortazavi Thu, 17 Dec 2009 13:04:03 -0800

Rajendra,

(I believe common-dev isn't for user questions on "common".)

Take a quick look at what "ETL" stands for:
   E--> Extract/Read ; T-->Transform (presumably through MapReduce);
L-->Load.
There's nothing in Hadoop that takes care of E and L for *arbitrary* data
source and sink out of the box.
(By the way, the same can be said for a whole lot of other ETL tools.)

So, you will have to do some work.
Your data warehouse should be possible to "T" through MapReduce in some
useful kind of way.
(That's a separate, Hadoop independent thing you need to look into.)
There may be a way to define data input and output streams for Hive. You
should look into that.  If such a facility doesn't exist, it should be
possible to create it but it will involve some work.
Sqoop might be a usable thing for you but I believe you'll need to build
some tooling of your own in this case.

I hope this is helpful.

- m.

On Thu, Dec 10, 2009 at 9:51 AM, Palikala, Rajendra (CCL) <
[email protected]> wrote:

>
> Is any one using Hadoop for ETL in Datawarehousing. Please advise. I know
> about Hive.
>
> -----Original Message-----
> From: momina khan [mailto:[email protected]]
> Sent: Thursday, December 10, 2009 7:17 AM
> To: [email protected]
> Subject: Re: A beginner question
>
> the best place to start is the MapReduce paper by Jeff Dean ...and try
> googling a talk by google's Aron on MapReduce
>
> momina
>
> On Thu, Dec 10, 2009 at 4:12 PM, Neo Anderson
> <[email protected]>wrote:
>
> > Hi
> >
> > I am interested in distributed computing and would like to learn core
> > concept e.g. MapReduce. However, I am new to Hadoop. So I get a question
> -
> > is there any simple task (e.g. jira issue) that would be good for a
> beginner
> > to start with?
> >
> > I appreciate any suggestion.
> >
> > Thank you very much.
> >
> >
> >
> >
>

Re: A beginner question

Reply via email to