Not everybody is ready to move to a new Hadoop version every so often. As Chris already mentioned it is a good idea to keep artifact names stable and detect features at runtime. We are doing that in Cascading as well: We compile it against one version of Hadoop, but do everything we can to keep it compatible with older and newer releases (currently 9 releases): https://github.com/Cascading/cascading.compatibility. This is more work for us as an upstream, but makes the live of our users a lot easier. Note that we do not publish a release per version, we ensure that the one release is binary compatible.
I believe Tez should provide a binary release that is tested and compatible with multiple versions of hadoop, instead of “compile your own”. While I understand that the ASF only demands source releases, I believe having binary releases, which are compatible with multiple versions of hadoop, will help with adoption, since it removes friction downstream. - André > On 08 Mar 2015, at 22:54, Bikas Saha <[email protected]> wrote: > > As an aside, Flink could consider moving to a more current version. There > have been many key improvements in Timeline Server, preemption, node labels, > resource monitoring etc. that users may want to take advantage of. > > If Tez publishes Hadoop version specific binaries to maven then Flink and > others may be able to consume them directly during development. > > Bikas > > -----Original Message----- > From: Robert Metzger [mailto:[email protected]] > Sent: Sunday, March 08, 2015 6:40 AM > To: [email protected] > Subject: Re: [DISCUSS] Publishing and releasing jars for different hadoop > version dependencies > > Hi Hitesh, > > I've talked about this with Kostas, let me check on some of our assumptions. > > You can compile Flink against a hadoop1 and hadoop2 profile. We would include > flink-on-tez only into our (default) hadoop2 profile. > For that profile, we use Hadoop 2.2.0. > > You can see on maven central, that we publish two versions of each flink > module for each release, a 0.8.1-hadoop1 and a 0.8.1 version. > This way users from both Hadoop APIs can use our system. > > Adding Tez as a dependency to Flink (hadoop2) would cause a dependency > conflict on the Hadoop version. Our parent pom enforces Hadoop 2.2.0 for all > dependencies, so we force Tez to use Hadoop 2.2.0 as well. > In my understanding the compilation fails in that case. > > If there would be a Tez version compatible with Hadoop 2.2.0 in mvn central, > we could add the "flink-on-tez" module to maven central. > > If thats not possible, users who want to use Flink-on-Tez have to compile > Flink against Hadoop 2.6.0 themselves. Its only one maven command, but less > convenient than something on mvn central. > > > On Fri, Mar 6, 2015 at 8:03 PM, Hitesh Shah <[email protected]> wrote: > >> Thanks for the feedback, Kostas, >> >> One clarification though - are you saying Tez should publish jars to >> maven central built against different versions of Hadoop? If yes, is >> this mainly due to the hadoop dependencies that Tez pulls in or due to >> any incompatibilities that you have noticed? >> >> thanks >> — Hitesh >> >> >> On Mar 6, 2015, at 9:03 AM, Kostas Tzoumas <[email protected]> wrote: >> >>> Publishing jars for different Hadoop dependencies, and in particular >>> for Hadoop 2.2 would also be beneficial for Flink on Tez as we offer >>> maven archetypes for users to create Flink applications. Currently, >>> we need to ask users that want to run Flink apps with Tez as backend >>> to compile the Flink code themselves due to a Hadoop version mismatch. >>> >>> >>> >>> On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <[email protected]> wrote: >>> >>>> From an ASF perspective, verifiable releases are only source releases. >> The >>>> binaries are just convenience artifacts that can also made >>>> available >> with a >>>> given release. Hence in terms of supporting multiple hadoop >>>> versions, >> we do >>>> want to allow various users/distros to compile Tez against their >> particular >>>> version of hadoop. >>>> >>>> From a run-time point of view , if Tez compiled against hadoop-2.6 >>>> is >> run >>>> on a 2.4 cluster, it should work normally as long as acls are >>>> disabled ( via tez config tez.am.acls.enabled ). That said, there >>>> are probably some improvements that could be done to handle the >>>> case where acls are >> enabled >>>> on a 2.4 cluster in a more cleaner manner. >>>> >>>> thanks >>>> — Hitesh >>>> >>>> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <[email protected]> wrote: >>>> >>>>> compile what against hadoop 2.4? Tez? Hopefully no one except Tez >>>>> devs >>>> ever compile Tez (once the apache committers offer up pre-built >> binaries, I >>>> only ever do for this reason). >>>>> >>>>> if compiling application code against Tez and Hadoop 2.4, the jar >>>>> won't >>>> come into play unless running tests (so i believe). >>>>> >>>>> I would then enhance option two to gracefully fail if -acls (the >>>> Manager) is not applicable (on hadoop 2.4) but mistakenly included >>>> in >> the >>>> 2.4 classpath (testing app code against hadoop 2.4) >>>>> >>>>> of course then this is really option 1 now with two jars. >>>>> >>>>> ckw >>>>> >>>>>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <[email protected]> wrote: >>>>>> >>>>>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this. >>>>>> >>>>>> At this point, I am inclined to follow option 2 mainly to retain >>>>>> the >>>> ability for users to compile against hadoop 2.4. I am not sure if >>>> there >> is >>>> a simple and performant way ( without using reflection for all 2.6 >> specific >>>> calls ) to retain compile compatibility with option 1. >>>>>> >>>>>> Any other comments for other folks on this issue in general or on >>>>>> the >> 2 >>>> options that Chris suggested? >>>>>> >>>>>> thanks >>>>>> — Hitesh >>>>>> >>>>>> >>>>>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <[email protected]> wrote: >>>>>> >>>>>>> The immediate issue is having two mutually exclusive artifacts: >>>> tez-yarn-timeline-history and tez-yarn-timeline-history >>>>>>> >>>>>>> outside of ATSHistoryACLPolicyManager, the code is identical. >>>>>>> just >> the >>>> dependencies are changed. >>>>>>> >>>>>>> TezClient attempts to load this Manager, under the assumption if >>>>>>> it >>>> exists, it is running on hadoop 2.6. (running on 2.4 is fatal) >>>>>>> >>>>>>> My recommendation would be never to change artifact names (or >>>> conditionally choose them) inside of major releases, but accreting >>>> new, optional, ones as versions progress is fine. >>>>>>> >>>>>>> thus I would either: >>>>>>> >>>>>>> create a single artifact tez-yarn-timeline-history compiled with >>>>>>> a >>>> default dep of hadoop 2.6, that includes the Manager. update the >> TezClient >>>> code to gracefully fail if the Manager is not applicable (the >>>> runtime >> env >>>> is Hadoop 2.4). >>>>>>> >>>>>>> or >>>>>>> >>>>>>> offer tez-yarn-timeline-history-with-acls as an optional >>>>>>> artifact for >>>> Hadoop 2.6 deployments, with the single Manager class in it, which >>>> in >> turn >>>> requires the tez-yarn-timeline-history artifact -- which is >>>> sufficient >> for >>>> a 2.4 runtime. if the user provides the additional -with-acls >>>> artifact, they are knowingly going to have problems on Hadoop 2.4. >>>>>>> >>>>>>> I prefer the first as it keeps my build file simple. graceful >>>> degradation of services per environment (with appropriate logging) >>>> is a well accepted practice. >>>>>>> >>>>>>> and you can now test Tez across multiple versions Hadoop/Yarn at >>>> runtime (outside of compile time). >>>>>>> >>>>>>> we do this with Cascading, just simple build file modifications >>>>>>> to >>>> verify binary compatibility (vendors fork this repo to verify their >>>> distributions, and been known to find critical bugs): >>>>>>> >>>>>>> https://github.com/Cascading/cascading.compatibility >>>>>>> >>>>>>> ckw >>>>>>> >>>>>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <[email protected]> >> wrote: >>>>>>>> >>>>>>>> Hi folks, >>>>>>>> >>>>>>>> Chris raised a good point earlier in terms of publishing jars >>>>>>>> for >> use >>>> against different versions of hadoop. For the most part, I think we >>>> have done well to ensure that the user-facing modules are version >>>> agnostic >> but >>>> the same does not hold for other modules which are times are needed >>>> by other applications for testing. >>>>>>>> >>>>>>>> There aren’t really too many different options we can try. The >>>> simplest option I can think of is just to build tez against >>>> different versions of hadoop with the tez.version set to something >>>> along the >> lines of >>>> “tez.version-hadoop.version”. This would imply having >>>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability >> point of >>>> view, depending on the option we pick, users will need to switch >>>> their dependencies to point to an appropriate version based on what >>>> version of hadoop they are using. For apps such as hive and pig, >>>> they will need to manage picking a particular version of tez based >>>> on which hadoop profile they are building against. >>>>>>>> >>>>>>>> Any other suggestions for publishing version dependent jars? >>>>>>>> >>>>>>>> For binary releases, should we release only the minimal >>>>>>>> tarball? or >>>> both the minimal and full tar balls? The full tarball is the >>>> recommended deployment model as it is more robust towards >>>> compatibility on a >> changing >>>> cluster. It should work in most scenarios as long as the hadoop >>>> client libraries that Tez depends on are compatible with the >>>> servers running on the cluster. >>>>>>>> >>>>>>>> General questions for the community/past release managers: >>>>>>>> - Should we retain the simple version ( i.e. plain only x.y.z ) >>>>>>>> when >>>> building against the default version of hadoop as determined by Tez? >> This >>>> “default.version” will have a tendency to evolve over time :) . >>>> These simple version jars would be in addition to the version specific >>>> jars. >>>>>>>> - What versions of hadoop should we compile against? 2.2, 2.4 >>>>>>>> and >> 2.6 >>>> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor >> version >>>> so we should pick the latest version in each line i.e. 2.2.1 over >>>> 2.2.0 >> if >>>> 2.2.1 exists. >>>>>>>> >>>>>>>> Any other comments? >>>>>>>> >>>>>>>> thanks >>>>>>>> — Hitesh >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> — >>>>>>> Chris K Wensel >>>>>>> [email protected] >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> — >>>>> Chris K Wensel >>>>> [email protected] >>>>> >>>>> >>>>> >>>>> >>>> >>>> >> >> -- André Kelpe [email protected] http://concurrentinc.com
