[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877477#action_12877477 ]
Jake Mannix commented on MAPREDUCE-1849: ---------------------------------------- [quote] The main difference from Pig seems to be allowing users to work in Java. [quote] To add my $0.02: FlumeJava lets the developers work in an object-oriented language, *period*. The difference between writing a Pig "script", or a SQL (or Hive variant therof) "query" and being able to seamlessly integrate distributed primitives (primitive not meaning java primitive, but "basic building block") in a standard java program is *amazing* The real comparison is between FlumeJava and *Cascading*, which also lets you stay in java-land, and has a query-plan optimizer. I'm no expert in Cascading, but it seems the primitives in Cascading are "verbs" related to flows, while FlumeJava really settles on a DistributedDataSet (PCollection, for them) as the object which has methods, and can be passed to methods of other (either distributed or normal) objects. I don't know if that is clearly better, but it certainly seems more in line with the way most people program in java. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.