[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877506#action_12877506
 ] 

Luke Lu commented on MAPREDUCE-1849:
------------------------------------

I had some experience with Cascading in production code. One of the major 
benefits of being a java library from my POV is easy unit testing of various 
user defined operations, which is inconvenient in most DSLs. OTOH, Cascading 
forces you to define data-flows explicitly (which is not so bad, if you have 
nice FlowBuilder utility class).

FlumeJava, IMO, actually captures the essence of MapReduce originated from 
functional programming. The immutable P* collections and side-effect free (no 
global effect) DoFn's allows many optimization opportunities a la Haskell's 
lazy evaluation (deferred evaluation in the paper.) However the lack of type 
inference and closure in Java makes the usage much more verbose than necessary. 
I think similar libraries could be better implemented in Scala.

> Implement a FlumeJava-like library for operations over parallel collections 
> using Hadoop MapReduce
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1849
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jeff Hammerbacher
>
> The API used internally at Google is described in great detail at 
> http://portal.acm.org/citation.cfm?id=1806596.1806638.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to