There was a talk at Apachecon NA, May 2016, Vancouver by Ken Krugler about Cascading + Flink.
Flink does provide a Cascading extension but its not part of Apache Flink and is a side project from Data artisans - https://github.com/dataArtisans/cascading-flink But as u said there are no performance numbers for this. I am not sure if it makes sense to include Cascading extensions as part of Pirk, but I would let others weigh in here. On Thu, Jul 28, 2016 at 10:36 PM, Darin Johnson <[email protected]> wrote: > Cascading is a higher level API for Hadoop-mapreduce, Tez and Flink. The > Pirk roadmap mentions support for a number of other frameworks (Flink and > Storm being two), this would take care of Flink and add Tez support as > well. > > If there's interest I'll add a JIRA and link other issues accordingly. > > I don't think there will be any license issues as: > > > 1. Cascading is Apache Licensed. > 2. Elastic Search dependencies are pulling in the dependencies > already, and RAT passes. > > There are good reasons not to go with this approach as well. Including: > > 1. Cascading in not an Apache Project - it's pretty much only Concurrent > calling the shots. > 2. Usually cascading is pretty good about optimizing Map/Reduce jobs, > however Tez and Flink extensions are new so I'm uncertain about the > performance hit vs native implementations. > > These may be blockers for inclusion in the project or making it part of a > contrib section. Thought I'd open it up for discussion. > > Darin >
