Eric Badger created TEZ-3240:
--------------------------------

             Summary: Improvements to tez.lib.uris to allow for multiple 
tarballs and mixing tarballs and jars. 
                 Key: TEZ-3240
                 URL: https://issues.apache.org/jira/browse/TEZ-3240
             Project: Apache Tez
          Issue Type: Improvement
            Reporter: Eric Badger
            Assignee: Eric Badger


Currently, tez.lib.uris only supports either a single archive or paths for 
multiple jars. You cannot mix and match between the two and you also cannot 
specify more than one archive. This means that you cannot specify both the tez 
and mapreduce archives. In the case where there is already a mapreduce archive 
in the distributed cache, you would not be able to use it when running tez. 
Instead, you would have to include the mapreduce jars in the single archive 
that you give to tez.lib.uris or use the mapreduce jars that are on the cluster 
node itself. This makes it very easy for the mapreduce versions to be out of 
sync with each other. 

With the current implementation, during a rolling upgrade it is very easy to 
have jobs that do not get the same mapreduce jars across all of the containers, 
since some will start after the node's jars have been upgraded and some will 
start before. 

If, instead, the job uses a archive that packages both tez and mapreduce 
together, then you will have 2 copies of the mapreduce jars in the distributed 
cache and will also have to upgrade both whenever you make a single upgrade to 
mapreduce. 

I propose 2 improvements:
1) Allow tez.lib.uris to take an arbitrary number of archives and jars, while 
not being limited to only one or the other
2) Allow tez.lib.uris to specify a fragment following the '#' symbol (as is 
done in mapreduce) that will create a symlink with the name of the fragment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to