Eric, Thanks for the details. I took a quick look at the link and it seems like a tool that I would help me here. Do I need to download whole Cloudera's Distribution for Hadoop <http://www.cloudera.com/hadoop> just to get sqoop? I already have Hadoop, Hive and Pig setup completed. I appreciate your input, Shiva
On Fri, Jan 22, 2010 at 1:53 PM, Eric Sammer <[email protected]> wrote: > On 1/22/10 3:09 PM, Shiva wrote: > > I can try that. Here is what I am trying to do. > > > > Load some fact data from a file (say weblogs moved to HDFS after some > > cleanup and transformation) and then do summarization at daily or weekly > > level. In that case, I would like to create a one fact table which get > > loaded with daily data and bring dimensional data from MySQL to perform > > summarization. > > > > I appreciate any input on this technique, performance and how I can get > > dimensional data to Hive (from MySQL -> to file -> HDFS -> Hive). > > Thanks, > > Shiva > > Shiva: > > This is very common. I use Hive to do something very similar. > > Cloudera has a tool called sqoop that will "export" MySQL tables to > files on HDFS that Hive can understand. Once there, you can easily join > the data in your Hive queries. > > http://www.cloudera.com/hadoop-sqoop > > Sqoop is smarter than just doing an export to a local file system and > then copying to HDFS and should save you a fair amount of time and > effort. Check out the link. > > Hope this helps. > -- > Eric Sammer > [email protected] > http://esammer.blogspot.com >
