I have mixed feelings on it.. I like Cascading, so my initial reaction is to want to do it. However, we already have MapReduce code, and I know we'd pick up Flink and Tez with Cascading, but I'd rather pick up Flink via Beam instead (and I wouldn't be surprised if there's eventually a Tez DataRunner for Beam). I'd like to see how a Cascading prototype of Pirk compares agains our code in MR. If their optimizations help out a lot, it would be a nice win.
On Thu, Jul 28, 2016 at 10:36 PM, Darin Johnson <[email protected]> wrote: > Cascading is a higher level API for Hadoop-mapreduce, Tez and Flink. The > Pirk roadmap mentions support for a number of other frameworks (Flink and > Storm being two), this would take care of Flink and add Tez support as > well. > > If there's interest I'll add a JIRA and link other issues accordingly. > > I don't think there will be any license issues as: > > > 1. Cascading is Apache Licensed. > 2. Elastic Search dependencies are pulling in the dependencies > already, and RAT passes. > > There are good reasons not to go with this approach as well. Including: > > 1. Cascading in not an Apache Project - it's pretty much only Concurrent > calling the shots. > 2. Usually cascading is pretty good about optimizing Map/Reduce jobs, > however Tez and Flink extensions are new so I'm uncertain about the > performance hit vs native implementations. > > These may be blockers for inclusion in the project or making it part of a > contrib section. Thought I'd open it up for discussion. > > Darin >
