William Lo created GOBBLIN-2135:
-----------------------------------

             Summary: Cache Yarn jars in GobblinYarnAppLauncher
                 Key: GOBBLIN-2135
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-2135
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: William Lo


Gobblin YARN Application Launcher lacks some functionality used in 
MRJobLauncher. One of the biggest gaps in feature parity is the absence of jar 
caching, where MRJobLauncher creates a monthly cache that is automatically 
cleaned up by subsequent executions performed 2 months in advance.

YARN/MR requires uploading jars to HDFS, this step can be quite slow (~15 mins 
for a sizeable job to get all the jars), and given that many jobs do share the 
same jars, it makes sense to cache them together and only provide YARN the 
shared path. 

We also want to ensure that SNAPSHOT jars are other files are not uploaded to a 
cache, since they are not immutable unlike jar versions on Artifactory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to