compile what against hadoop 2.4? Tez? Hopefully no one except Tez devs ever 
compile Tez (once the apache committers offer up pre-built binaries, I only 
ever do for this reason).

if compiling application code against Tez and Hadoop 2.4, the jar won't come 
into play unless running tests (so i believe).

I would then enhance option two to gracefully fail if -acls (the Manager) is 
not applicable (on hadoop 2.4) but mistakenly included in the 2.4 classpath 
(testing app code against hadoop 2.4)

of course then this is really option 1 now with two jars.

ckw

> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <[email protected]> wrote:
> 
> Thanks for the suggestions, Chris. Filed TEZ-2168 for this. 
> 
> At this point, I am inclined to follow option 2 mainly to retain the ability 
> for users to compile against hadoop 2.4. I am not sure if there is a simple 
> and performant way ( without using reflection for all 2.6 specific calls ) to 
> retain compile compatibility with option 1.
> 
> Any other comments for other folks on this issue in general or on the 2 
> options that Chris suggested? 
> 
> thanks
> — Hitesh
> 
> 
> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <[email protected]> wrote:
> 
>> The immediate issue is having two mutually exclusive artifacts: 
>> tez-yarn-timeline-history and tez-yarn-timeline-history
>> 
>> outside of ATSHistoryACLPolicyManager, the code is identical. just the 
>> dependencies are changed.
>> 
>> TezClient attempts to load this Manager, under the assumption if it exists, 
>> it is running on hadoop 2.6. (running on 2.4 is fatal)
>> 
>> My recommendation would be never to change artifact names (or conditionally 
>> choose them) inside of major releases, but accreting new, optional, ones as 
>> versions progress is fine.
>> 
>> thus I would either:
>> 
>> create a single artifact tez-yarn-timeline-history compiled with a default 
>> dep of hadoop 2.6, that includes the Manager. update the TezClient code to 
>> gracefully fail if the Manager is not applicable (the runtime env is Hadoop 
>> 2.4).
>> 
>> or
>> 
>> offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 
>> 2.6 deployments, with the single Manager class in it, which in turn requires 
>> the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 
>> runtime. if the user provides the additional -with-acls artifact, they are 
>> knowingly going to have problems on Hadoop 2.4.
>> 
>> I prefer the first as it keeps my build file simple. graceful degradation of 
>> services per environment (with appropriate logging) is a well accepted 
>> practice.
>> 
>> and you can now test Tez across multiple versions Hadoop/Yarn at runtime 
>> (outside of compile time).
>> 
>> we do this with Cascading, just simple build file modifications to verify 
>> binary compatibility (vendors fork this repo to verify their distributions, 
>> and been known to find critical bugs):
>> 
>> https://github.com/Cascading/cascading.compatibility
>> 
>> ckw
>> 
>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <[email protected]> wrote:
>>> 
>>> Hi folks, 
>>> 
>>> Chris raised a good point earlier in terms of publishing jars for use 
>>> against different versions of hadoop. For the most part, I think we have 
>>> done well to ensure that the user-facing modules are version agnostic but 
>>> the same does not hold for other modules which are times are needed by 
>>> other applications for testing.
>>> 
>>> There aren’t really too many different options we can try.  The simplest 
>>> option I can think of is just to build tez against different versions of 
>>> hadoop with the tez.version set to something along the lines of 
>>> “tez.version-hadoop.version”. This would imply having 
>>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of 
>>> view, depending on the option we pick, users will need to switch their 
>>> dependencies to point to an appropriate version based on what version of 
>>> hadoop they are using. For apps such as hive and pig, they will need to 
>>> manage picking a particular version of tez based on which hadoop profile 
>>> they are building against. 
>>> 
>>> Any other suggestions for publishing version dependent jars?
>>> 
>>> For binary releases, should we release only the minimal tarball? or both 
>>> the minimal and full tar balls? The full tarball is the recommended 
>>> deployment model as it is more robust towards compatibility on a changing 
>>> cluster. It should work in most scenarios as long as the hadoop client 
>>> libraries that Tez depends on are compatible with the servers running on 
>>> the cluster.
>>> 
>>> General questions for the community/past release managers: 
>>> - Should we retain the simple version ( i.e. plain only x.y.z ) when 
>>> building against the default version of hadoop as determined by Tez? This 
>>> “default.version” will have a tendency to evolve over time :) . These 
>>> simple version jars would be in addition to the version specific jars. 
>>> - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 
>>> 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so 
>>> we should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 
>>> 2.2.1 exists. 
>>> 
>>> Any other comments? 
>>> 
>>> thanks
>>> — Hitesh
>>> 
>>> 
>> 
>> —
>> Chris K Wensel
>> [email protected]
>> 
>> 
>> 
>> 
> 

—
Chris K Wensel
[email protected]




Reply via email to