lukecwik commented on a change in pull request #11203: [BEAM-9577] Define and
implement dependency-aware artifact staging service.
URL: https://github.com/apache/beam/pull/11203#discussion_r400470967
##########
File path: model/job-management/src/main/proto/beam_artifact_api.proto
##########
@@ -31,8 +31,92 @@ option java_outer_classname = "ArtifactApi";
import "beam_runner_api.proto";
-// A service to stage artifacts for use in a Job.
+// A service to retrieve artifacts for use in a Job.
+service ArtifactRetrievalService {
+ // Resolves the given artifact references into one or more replacement
+ // artifact references (e.g. a Maven dependency into a (transitive) set
+ // of jars.
+ rpc ResolveArtifact(ResolveArtifactRequest) returns
(ResolveArtifactResponse);
+
+ // Retrieves the given artifact as a stream of bytes.
+ rpc GetArtifact(GetArtifactRequest) returns (stream GetArtifactResponse);
+}
+
+// A service that allows the client to act as an ArtifactRetrievalService,
+// for a particular job with the server initiating requests and receiving
+// responses.
+//
+// A client calls the service with an ArtifactResponseWrapper that has the
+// staging token set, and thereafter responds to the server's requests.
service ArtifactStagingService {
+ rpc ReverseArtifactRetrievalService(stream ArtifactResponseWrapper)
+ returns (stream ArtifactRequestWrapper);
+}
+
+// A request for artifact resolution.
+message ResolveArtifactRequest {
+ // A set of artifacts to (jointly) resolve.
+ repeated org.apache.beam.model.pipeline.v1.ArtifactInformation artifacts = 1;
+
+ // A set of artifact type urns that are understood by the requester.
+ // An attempt should be made to resolve the artifacts in terms of these URNs,
+ // but other URNs may be used as well with the understanding that they must
+ // be fetch-able as bytes via GetArtifact.
+ repeated string preferred_urns = 2;
+}
+
+// A response for artifact resolution.
+message ResolveArtifactResponse {
+ // A full set of replacements for the set of requested artifacts, preferably
+ // in terms of the requested type URNs. If there is no better resolution,
+ // the original list is returned.
+ repeated org.apache.beam.model.pipeline.v1.ArtifactInformation replacements
= 1;
+
+ // (Optional) If set, used to indicate the artifacts are mutually
inconsistent
+ // (e.g. due to a diamond dependency problem) or could otherwise not be
+ // resolved (e.g. due to invalid specifications).
+ string error = 2;
+}
+
+// A request to get an artifact.
+message GetArtifactRequest {
+ org.apache.beam.model.pipeline.v1.ArtifactInformation artifact = 1;
Review comment:
Should we have a resume_offset field?
Whenever we need to stage a large amount of data, it seems as though the SDK
would reconnect so a "retry" would be possible and should be able to pass in a
point to resume from for a large stream.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services