[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877506#action_12877506 ]
Luke Lu commented on MAPREDUCE-1849: ------------------------------------ I had some experience with Cascading in production code. One of the major benefits of being a java library from my POV is easy unit testing of various user defined operations, which is inconvenient in most DSLs. OTOH, Cascading forces you to define data-flows explicitly (which is not so bad, if you have nice FlowBuilder utility class). FlumeJava, IMO, actually captures the essence of MapReduce originated from functional programming. The immutable P* collections and side-effect free (no global effect) DoFn's allows many optimization opportunities a la Haskell's lazy evaluation (deferred evaluation in the paper.) However the lack of type inference and closure in Java makes the usage much more verbose than necessary. I think similar libraries could be better implemented in Scala. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.