compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs ever compile Tez (once the apache committers offer up pre-built binaries, I only ever do for this reason).
if compiling application code against Tez and Hadoop 2.4, the jar won't come into play unless running tests (so i believe). I would then enhance option two to gracefully fail if -acls (the Manager) is not applicable (on hadoop 2.4) but mistakenly included in the 2.4 classpath (testing app code against hadoop 2.4) of course then this is really option 1 now with two jars. ckw > On Mar 2, 2015, at 3:05 PM, Hitesh Shah <[email protected]> wrote: > > Thanks for the suggestions, Chris. Filed TEZ-2168 for this. > > At this point, I am inclined to follow option 2 mainly to retain the ability > for users to compile against hadoop 2.4. I am not sure if there is a simple > and performant way ( without using reflection for all 2.6 specific calls ) to > retain compile compatibility with option 1. > > Any other comments for other folks on this issue in general or on the 2 > options that Chris suggested? > > thanks > — Hitesh > > > On Feb 26, 2015, at 1:18 PM, Chris K Wensel <[email protected]> wrote: > >> The immediate issue is having two mutually exclusive artifacts: >> tez-yarn-timeline-history and tez-yarn-timeline-history >> >> outside of ATSHistoryACLPolicyManager, the code is identical. just the >> dependencies are changed. >> >> TezClient attempts to load this Manager, under the assumption if it exists, >> it is running on hadoop 2.6. (running on 2.4 is fatal) >> >> My recommendation would be never to change artifact names (or conditionally >> choose them) inside of major releases, but accreting new, optional, ones as >> versions progress is fine. >> >> thus I would either: >> >> create a single artifact tez-yarn-timeline-history compiled with a default >> dep of hadoop 2.6, that includes the Manager. update the TezClient code to >> gracefully fail if the Manager is not applicable (the runtime env is Hadoop >> 2.4). >> >> or >> >> offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop >> 2.6 deployments, with the single Manager class in it, which in turn requires >> the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 >> runtime. if the user provides the additional -with-acls artifact, they are >> knowingly going to have problems on Hadoop 2.4. >> >> I prefer the first as it keeps my build file simple. graceful degradation of >> services per environment (with appropriate logging) is a well accepted >> practice. >> >> and you can now test Tez across multiple versions Hadoop/Yarn at runtime >> (outside of compile time). >> >> we do this with Cascading, just simple build file modifications to verify >> binary compatibility (vendors fork this repo to verify their distributions, >> and been known to find critical bugs): >> >> https://github.com/Cascading/cascading.compatibility >> >> ckw >> >>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <[email protected]> wrote: >>> >>> Hi folks, >>> >>> Chris raised a good point earlier in terms of publishing jars for use >>> against different versions of hadoop. For the most part, I think we have >>> done well to ensure that the user-facing modules are version agnostic but >>> the same does not hold for other modules which are times are needed by >>> other applications for testing. >>> >>> There aren’t really too many different options we can try. The simplest >>> option I can think of is just to build tez against different versions of >>> hadoop with the tez.version set to something along the lines of >>> “tez.version-hadoop.version”. This would imply having >>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of >>> view, depending on the option we pick, users will need to switch their >>> dependencies to point to an appropriate version based on what version of >>> hadoop they are using. For apps such as hive and pig, they will need to >>> manage picking a particular version of tez based on which hadoop profile >>> they are building against. >>> >>> Any other suggestions for publishing version dependent jars? >>> >>> For binary releases, should we release only the minimal tarball? or both >>> the minimal and full tar balls? The full tarball is the recommended >>> deployment model as it is more robust towards compatibility on a changing >>> cluster. It should work in most scenarios as long as the hadoop client >>> libraries that Tez depends on are compatible with the servers running on >>> the cluster. >>> >>> General questions for the community/past release managers: >>> - Should we retain the simple version ( i.e. plain only x.y.z ) when >>> building against the default version of hadoop as determined by Tez? This >>> “default.version” will have a tendency to evolve over time :) . These >>> simple version jars would be in addition to the version specific jars. >>> - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or >>> 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so >>> we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if >>> 2.2.1 exists. >>> >>> Any other comments? >>> >>> thanks >>> — Hitesh >>> >>> >> >> — >> Chris K Wensel >> [email protected] >> >> >> >> > — Chris K Wensel [email protected]
