I have mixed feelings on it.. I like Cascading, so my initial reaction is
to want to do it.  However, we already have MapReduce code, and I know we'd
pick up Flink and Tez with Cascading, but I'd rather pick up Flink via Beam
instead (and I wouldn't be surprised if there's eventually a Tez DataRunner
for Beam).  I'd like to see how a Cascading prototype of Pirk compares
agains our code in MR.  If their optimizations help out a lot, it would be
a nice win.


On Thu, Jul 28, 2016 at 10:36 PM, Darin Johnson <[email protected]>
wrote:

> Cascading is a higher level API for Hadoop-mapreduce, Tez and Flink.  The
> Pirk roadmap mentions support for a number of other frameworks (Flink and
> Storm being two), this would take care of Flink and add Tez support as
> well.
>
> If there's interest I'll add a JIRA and link other issues accordingly.
>
> I don't think there will be any license issues as:
>
>
>    1.   Cascading is Apache Licensed.
>    2.   Elastic Search dependencies are pulling in the dependencies
>    already, and RAT passes.
>
> There are good reasons not to go with this approach as well. Including:
>
>    1. Cascading in not an Apache Project - it's pretty much only Concurrent
>    calling the shots.
>    2. Usually cascading is pretty good about optimizing Map/Reduce jobs,
>    however Tez and Flink extensions are new so I'm uncertain about the
>    performance hit vs native implementations.
>
> These may be blockers for inclusion in the project or making it part of a
> contrib section.  Thought I'd open it up for discussion.
>
> Darin
>

Reply via email to