kennknowles opened a new issue, #19076:
URL: https://github.com/apache/beam/issues/19076

   
[https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java](https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java)
 is the main implementation of ArtifactStagingService.
   
   It stages artifacts into a directory; and in practice the passed staging 
session token is such that the directory is different for every job. This leads 
to 2 issues:
    * It doesn't get cleaned up when the job finishes or even when the 
JobService shuts down, so we have disk space leaks if running a lot of jobs 
(e.g. a suite of ValidatesRunner tests)
    * We repeatedly re-stage the same artifacts. Instead, ideally, we should 
identify that some artifacts don't need to be staged - based on knowing their 
md5. The artifact staging protocol has rudimentary support for this but may 
need to be modified.
   
   CC: [~angoenka]
   
   Imported from Jira 
[BEAM-4778](https://issues.apache.org/jira/browse/BEAM-4778). Original Jira may 
contain additional context.
   Reported by: jkff.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to