....and we shuld move the Beam and Streaming integration discussions to a different thread.
On Mon, Jul 18, 2016 at 11:32 AM, Suneel Marthi <[email protected]> wrote: > > > On Mon, Jul 18, 2016 at 11:03 AM, Chris Harris <[email protected]> > wrote: > >> I agree separating the benchmarking code out might be a good idea. Maybe >> we want to make this a multi-module project and the benchmarking be a >> submodule? Won't JMH still need to included via the child pom in the >> benchmarking though; I don't think it can be marked as "provided" or >> anything, right? Glad to hear if you had a different thought on how to go >> with this though. >> >> Maybe it's good to create submodules for all the adapters (eg. Spark, MR, >> Flink, Storm etc...) too? Then we won't have just one jar needing to >> carry >> around all those dependencies that aren't used (and reduce potential >> version conflicts). I don't know if this would shake things up too much >> to >> do this restructuring now, or if it is better to do now before we move >> too >> far down the line. We'd have to look at what's Pirk "core" vs not, but I >> don't think that's too tough to discern right now. Most of the adapter >> code is in org.apache.pirk.responder. >> > > I have a suggestion here. Its not a good idea to be having submodules for > every streaming engine that's out there. In the long run, its gonna be a > maintenance and compatibility nightmare with every release of Flink, Spark > etc... > > I am guilty of having wasted the last 2 yrs of my life (and fellow > committers time) in adding support for Spark, Flink, H2O on the Apache > Mahout project as distributed backend engines. > > I wish now that Apache Beam was around in 2014, I would have to then just > support Beam and be abstracted away from all of the other Streaming > frameworks. > > As a new project, I would suggest that we look into integrating with Beam > and keep the codebase lean and not bloat the project with all the different > Streaming engines. > > Beam is a unified Batch + Streaming processing engine from google. All of > the other Streaming engines like Flink, Spark, Apex, Storm etc... are now > vying to provide native runners that can support Beam. This kind'a makes > Beam an abstraction over every other streaming framework. > > As an application developer, I would write my jobs against the Beam API > and have the option of being able to execute the same as a Spark batch job, > Flink streaming/batch job etc. This completely shields the developer from > having to support the plethora of streaming engines out there. > > On the Mahout project, we built a complete logical layer for coding up ML > algorithms, and a physical layer that translates the logical plan to run on > different execution engines like Spark, Flink. There was no Beam then when > we started that effort in early 2014. > I would do it a different way today given Beam. > > It would be real cool by way of publicity and building a community for > Pirk, if Pirk were to be one of the few projects out there that support > Beam. > > > > >> Regards, >> Chris >> >> >> >> On Mon, Jul 18, 2016 at 6:25 AM, Tim Ellison <[email protected]> >> wrote: >> >> > On 17/07/16 16:57, Ellison Anne Williams wrote: >> > > Suneel -- Thanks for creating the JIRA issue and pointing out the >> > licensing >> > > problems. I see that JMH is under the GNU GPL2 ( >> > > http://openjdk.java.net/legal/) which is not compatible with the >> Apache >> > > license (http://www.apache.org/legal/resolved.html). >> > > >> > > It appears that Flink just removed the benchmarking code instead of >> > > re-porting it to another option. >> > > >> > > I would like us to port it to another license-compatible benchmarking >> > > framework such as Google Caliper (or something similar) instead of >> > removing >> > > the code as the benchmarking is important for encryption optimization. >> > > >> > > Thoughts? >> > >> > JMH is GPLv2 with classpath exception [1], which means that it cannot be >> > distributed as part of the ALv2 licensed works (Pirk), but there is no >> > problem with using this library as a tool / dependency at runtime. >> > Afterall, there is no Java runtime that allows for redistribution under >> > ALv2 either! >> > >> > That said, running the "mvn package" target *does* put JMH generated >> > code into the resulting pirk-0.0.1-SNAPSHOT.jar -- which then begs the >> > question why Pirk is putting test code into the library? >> > >> > So if the benchmark code were not part of the delivery then you can >> > continue to use JMH, but if that is there for a reason then we would >> > have to switch to a compatible licensed framework. >> > >> > [1] http://hg.openjdk.java.net/code-tools/jmh/file/c050a47b2b37/LICENSE >> > >> > Regards, >> > Tim >> > >> > >
