Re: [jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce

Milind A Bhandarkar Thu, 10 Jun 2010 08:49:33 -0700

I think, a prerequisite for implementing FlumeJava is to improve  JobControl to 
allow DAGs of Hadoop jobs such that independent jobs can be executed in 
parallel. It also needs to be enriched with intermediate data management.

A simpler alternative would be to implement FlumeJava on top of Oozie.

Ideally, FlumeJava should be a Pig backend.

----- Original Message -----
From: Jeff Hammerbacher (JIRA) <[email protected]>
To: [email protected] <[email protected]>
Sent: Thu Jun 10 08:31:18 2010
Subject: [jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library 
for operations over parallel collections using Hadoop MapReduce

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877451#action_12877451
 ] 

Jeff Hammerbacher commented on MAPREDUCE-1849:
----------------------------------------------

Owen: sure. They provide "derived operators" as well, like count(), join(), and 
top(). The main difference from Pig seems to be allowing users to work in Java. 
In fact, the Google team initially implemented their approach in a new language 
called Lumberjack, but mentions that, among other things, the implementation of 
a new language was a lot of work, and most importantly, novelty is an obstacle 
to adoption. They settled on Java and seem to have had some internal success.

> Implement a FlumeJava-like library for operations over parallel collections 
> using Hadoop MapReduce
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1849
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jeff Hammerbacher
>
> The API used internally at Google is described in great detail at 
> http://portal.acm.org/citation.cfm?id=1806596.1806638.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce

Reply via email to