[
https://issues.apache.org/jira/browse/BEAM-4291?focusedWorklogId=113390&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-113390
]
ASF GitHub Bot logged work on BEAM-4291:
----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Jun/18 21:32
Start Date: 19/Jun/18 21:32
Worklog Time Spent: 10m
Work Description: jkff commented on a change in pull request #5676:
[BEAM-4291] Propagates artifact retrieval token in Flink runner and to the Java
harness
URL: https://github.com/apache/beam/pull/5676#discussion_r196585622
##########
File path: model/fn-execution/src/main/proto/beam_provision_api.proto
##########
@@ -67,6 +67,10 @@ message ProvisionInfo {
// (optional) Resource limits that the SDK harness worker should respect.
// Runners may -- but are not required to -- enforce any limits provided.
Resources resource_limits = 4;
+
+ // (required) The artifact retrieval token produced by
+ // ArtifactStagingService.CommitManifestResponse.
+ string retrieval_token = 6;
Review comment:
This design was discussed some time ago (see also
https://github.com/apache/beam/pull/5582) - we're coming from the assumption
that all services are (or at least may be) globally distributed and stateless,
i.e. we're not relying on the assumption that there's 1
ArtifactRetrievalService per worker or per harness. Without that assumption, we
need the ArtifactRetrievalService calls to be somehow linked to which job we're
talking about. Likewise, ArtifactStagingService also needs to know which job
we're talking about.
We decided to do this by propagating tokens:
- PrepareJob returns a token used for ArtifactStagingService calls
- ArtifactStagingService.CommitManifest returns a token used for
ArtifactRetrievalService calls
This token is an opaque string containing the information necessary for the
service to do its job. In practice, with the "distributed file system" based
implementations of both services, we're using (basically) a base path as the
token.
Alternatively we could explicitly include the job ID in the RPCs, but that
would require the services to do some sort of global lookup of artifact
placement parameters based on job ID, it seems easier to include the necessary
parameters explicitly in the token.
Now, since a retrieval token is needed for the harness to talk to the
retrieval service, it seems reasonable to include it in provision info. It's
not part of the service descriptor because it does not identify the service, it
only gives a necessary argument for its RPCs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 113390)
Time Spent: 6h 50m (was: 6h 40m)
> ArtifactRetrievalService that retrieves artifacts from a distributed
> filesystem
> -------------------------------------------------------------------------------
>
> Key: BEAM-4291
> URL: https://issues.apache.org/jira/browse/BEAM-4291
> Project: Beam
> Issue Type: Sub-task
> Components: runner-core
> Reporter: Eugene Kirpichov
> Assignee: Axel Magnuson
> Priority: Major
> Fix For: 2.6.0
>
> Time Spent: 6h 50m
> Remaining Estimate: 0h
>
> In agreement with how they are staged in BEAM-4290.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)