William Lo created GOBBLIN-2135: ----------------------------------- Summary: Cache Yarn jars in GobblinYarnAppLauncher Key: GOBBLIN-2135 URL: https://issues.apache.org/jira/browse/GOBBLIN-2135 Project: Apache Gobblin Issue Type: Improvement Reporter: William Lo
Gobblin YARN Application Launcher lacks some functionality used in MRJobLauncher. One of the biggest gaps in feature parity is the absence of jar caching, where MRJobLauncher creates a monthly cache that is automatically cleaned up by subsequent executions performed 2 months in advance. YARN/MR requires uploading jars to HDFS, this step can be quite slow (~15 mins for a sizeable job to get all the jars), and given that many jobs do share the same jars, it makes sense to cache them together and only provide YARN the shared path. We also want to ensure that SNAPSHOT jars are other files are not uploaded to a cache, since they are not immutable unlike jar versions on Artifactory. -- This message was sent by Atlassian Jira (v8.20.10#820010)