There was a talk at Apachecon NA, May 2016, Vancouver by Ken Krugler about
Cascading + Flink.

Flink does provide a Cascading extension but its not part of Apache Flink
and is a side project from Data artisans -
https://github.com/dataArtisans/cascading-flink

But as u said there are no performance numbers for this.

I am not sure if it makes sense to include Cascading extensions as part of
Pirk, but I would let others weigh in here.



On Thu, Jul 28, 2016 at 10:36 PM, Darin Johnson <[email protected]>
wrote:

> Cascading is a higher level API for Hadoop-mapreduce, Tez and Flink.  The
> Pirk roadmap mentions support for a number of other frameworks (Flink and
> Storm being two), this would take care of Flink and add Tez support as
> well.
>
> If there's interest I'll add a JIRA and link other issues accordingly.
>
> I don't think there will be any license issues as:
>
>
>    1.   Cascading is Apache Licensed.
>    2.   Elastic Search dependencies are pulling in the dependencies
>    already, and RAT passes.
>
> There are good reasons not to go with this approach as well. Including:
>
>    1. Cascading in not an Apache Project - it's pretty much only Concurrent
>    calling the shots.
>    2. Usually cascading is pretty good about optimizing Map/Reduce jobs,
>    however Tez and Flink extensions are new so I'm uncertain about the
>    performance hit vs native implementations.
>
> These may be blockers for inclusion in the project or making it part of a
> contrib section.  Thought I'd open it up for discussion.
>
> Darin
>

Reply via email to