kennknowles opened a new issue, #19076: URL: https://github.com/apache/beam/issues/19076
[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java](https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java) is the main implementation of ArtifactStagingService. It stages artifacts into a directory; and in practice the passed staging session token is such that the directory is different for every job. This leads to 2 issues: * It doesn't get cleaned up when the job finishes or even when the JobService shuts down, so we have disk space leaks if running a lot of jobs (e.g. a suite of ValidatesRunner tests) * We repeatedly re-stage the same artifacts. Instead, ideally, we should identify that some artifacts don't need to be staged - based on knowing their md5. The artifact staging protocol has rudimentary support for this but may need to be modified. CC: [~angoenka] Imported from Jira [BEAM-4778](https://issues.apache.org/jira/browse/BEAM-4778). Original Jira may contain additional context. Reported by: jkff. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
