[
https://issues.apache.org/jira/browse/BEAM-4778?focusedWorklogId=125841&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-125841
]
ASF GitHub Bot logged work on BEAM-4778:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/Jul/18 02:14
Start Date: 23/Jul/18 02:14
Worklog Time Spent: 10m
Work Description: ryan-williams commented on a change in pull request
#5958: [BEAM-4778] add option to flink job server to clean staged artifacts
per-job
URL: https://github.com/apache/beam/pull/5958#discussion_r204254489
##########
File path:
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactRetrievalService.java
##########
@@ -182,6 +182,10 @@ static ProxyManifest loadManifest(String retrievalToken)
throws IOException {
LOG.info("Loading manifest for retrieval token {}", retrievalToken);
// look for manifest file at $retrieval_token
ResourceId manifestResourceId =
getManifestLocationFromToken(retrievalToken);
+ return loadManifest(manifestResourceId);
+ }
+
+ public static ProxyManifest loadManifest(ResourceId manifestResourceId)
throws IOException {
Review comment:
done
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 125841)
Time Spent: 2h (was: 1h 50m)
> Less wasteful ArtifactStagingService
> ------------------------------------
>
> Key: BEAM-4778
> URL: https://issues.apache.org/jira/browse/BEAM-4778
> Project: Beam
> Issue Type: Bug
> Components: runner-core
> Reporter: Eugene Kirpichov
> Assignee: Ryan Williams
> Priority: Major
> Time Spent: 2h
> Remaining Estimate: 0h
>
> [https://github.com/apache/beam/blob/master/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/artifact/BeamFileSystemArtifactStagingService.java]
> is the main implementation of ArtifactStagingService.
> It stages artifacts into a directory; and in practice the passed staging
> session token is such that the directory is different for every job. This
> leads to 2 issues:
> * It doesn't get cleaned up when the job finishes or even when the
> JobService shuts down, so we have disk space leaks if running a lot of jobs
> (e.g. a suite of ValidatesRunner tests)
> * We repeatedly re-stage the same artifacts. Instead, ideally, we should
> identify that some artifacts don't need to be staged - based on knowing their
> md5. The artifact staging protocol has rudimentary support for this but may
> need to be modified.
> CC: [~angoenka]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)