Thanks for the suggestions, Chris. Filed TEZ-2168 for this. 

At this point, I am inclined to follow option 2 mainly to retain the ability 
for users to compile against hadoop 2.4. I am not sure if there is a simple and 
performant way ( without using reflection for all 2.6 specific calls ) to 
retain compile compatibility with option 1.

Any other comments for other folks on this issue in general or on the 2 options 
that Chris suggested? 

thanks
— Hitesh


On Feb 26, 2015, at 1:18 PM, Chris K Wensel <[email protected]> wrote:

> The immediate issue is having two mutually exclusive artifacts: 
> tez-yarn-timeline-history and tez-yarn-timeline-history
> 
> outside of ATSHistoryACLPolicyManager, the code is identical. just the 
> dependencies are changed.
> 
> TezClient attempts to load this Manager, under the assumption if it exists, 
> it is running on hadoop 2.6. (running on 2.4 is fatal)
> 
> My recommendation would be never to change artifact names (or conditionally 
> choose them) inside of major releases, but accreting new, optional, ones as 
> versions progress is fine.
> 
> thus I would either:
> 
> create a single artifact tez-yarn-timeline-history compiled with a default 
> dep of hadoop 2.6, that includes the Manager. update the TezClient code to 
> gracefully fail if the Manager is not applicable (the runtime env is Hadoop 
> 2.4).
> 
> or
> 
> offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 
> 2.6 deployments, with the single Manager class in it, which in turn requires 
> the tez-yarn-timeline-history artifact -- which is sufficient for a 2.4 
> runtime. if the user provides the additional -with-acls artifact, they are 
> knowingly going to have problems on Hadoop 2.4.
> 
> I prefer the first as it keeps my build file simple. graceful degradation of 
> services per environment (with appropriate logging) is a well accepted 
> practice.
> 
> and you can now test Tez across multiple versions Hadoop/Yarn at runtime 
> (outside of compile time).
> 
> we do this with Cascading, just simple build file modifications to verify 
> binary compatibility (vendors fork this repo to verify their distributions, 
> and been known to find critical bugs):
> 
> https://github.com/Cascading/cascading.compatibility
> 
> ckw
> 
>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <[email protected]> wrote:
>> 
>> Hi folks, 
>> 
>> Chris raised a good point earlier in terms of publishing jars for use 
>> against different versions of hadoop. For the most part, I think we have 
>> done well to ensure that the user-facing modules are version agnostic but 
>> the same does not hold for other modules which are times are needed by other 
>> applications for testing.
>> 
>> There aren’t really too many different options we can try.  The simplest 
>> option I can think of is just to build tez against different versions of 
>> hadoop with the tez.version set to something along the lines of 
>> “tez.version-hadoop.version”. This would imply having 
>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of 
>> view, depending on the option we pick, users will need to switch their 
>> dependencies to point to an appropriate version based on what version of 
>> hadoop they are using. For apps such as hive and pig, they will need to 
>> manage picking a particular version of tez based on which hadoop profile 
>> they are building against. 
>> 
>> Any other suggestions for publishing version dependent jars?
>> 
>> For binary releases, should we release only the minimal tarball? or both the 
>> minimal and full tar balls? The full tarball is the recommended deployment 
>> model as it is more robust towards compatibility on a changing cluster. It 
>> should work in most scenarios as long as the hadoop client libraries that 
>> Tez depends on are compatible with the servers running on the cluster.
>> 
>> General questions for the community/past release managers: 
>>  - Should we retain the simple version ( i.e. plain only x.y.z ) when 
>> building against the default version of hadoop as determined by Tez? This 
>> “default.version” will have a tendency to evolve over time :) . These simple 
>> version jars would be in addition to the version specific jars. 
>>  - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 
>> 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor version so we 
>> should pick the latest version in each line i.e. 2.2.1 over 2.2.0 if 2.2.1 
>> exists. 
>> 
>> Any other comments? 
>> 
>> thanks
>> — Hitesh
>> 
>> 
> 
> —
> Chris K Wensel
> [email protected]
> 
> 
> 
> 

Reply via email to