[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877477#action_12877477
 ] 

Jake Mannix commented on MAPREDUCE-1849:
----------------------------------------

[quote]
The main difference from Pig seems to be allowing users to work in Java.
[quote]

To add my $0.02: FlumeJava lets the developers work in an object-oriented 
language, *period*.  The difference between writing a Pig "script", or a SQL 
(or Hive variant therof) "query" and being able to seamlessly integrate 
distributed primitives (primitive not meaning java primitive, but "basic 
building block") in a standard java program is *amazing*

The real comparison is between FlumeJava and *Cascading*, which also lets you 
stay in java-land, and has a query-plan optimizer.  I'm no expert in Cascading, 
but it seems the primitives in Cascading are "verbs" related to flows, while 
FlumeJava really settles on a DistributedDataSet (PCollection, for them) as the 
object which has methods, and can be passed to methods of other (either 
distributed or normal) objects.  I don't know if that is clearly better, but it 
certainly seems more in line with the way most people program in java.

> Implement a FlumeJava-like library for operations over parallel collections 
> using Hadoop MapReduce
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1849
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jeff Hammerbacher
>
> The API used internally at Google is described in great detail at 
> http://portal.acm.org/citation.cfm?id=1806596.1806638.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to