pskevin commented on pull request #12445:
URL: https://github.com/apache/beam/pull/12445#issuecomment-669026718
I've addressed the initial comments with the exception of ones suggesting
organizational changes. In general this discussion will dictate where the final
(and cleaner) API will reside and how the user will make use of it. Since I
won't be the stakeholder for this feature after a few weeks, I thought it'd be
worthwhile bringing in more opinions – @lostluck @robertwb @youngoli
The challenges and possible solutions are as follows:
- Where should all xlang code be?\
There are multiple moving parts related to the execution of cross-language
transforms which are now condensed into one `TryCrossLanguage` function in
`external.go`
- Allowing various combinations of inputs/outputs (n:n, 1:0, 0:1)
- Correctly adding `ExternalTransform` to `Graph` representing the
unexpanded cross-language transform within the pipeline
- Building the `ExpansionRequest`
- Querying the expansion service for the corresponding
`ExpansionResponse`
- Querying the expansion service for Artifact Resolution and Staging
- Handling the expanded components of the `ExternalTransform`
Decomposing these functions correctly within package(s) dedicated to
`ExternalTransform` whether as `beam/.../xlangx` or `beam/.../xlang` or both
will be ideal. It helps in segregation of responsibilities so that each
function can be developed to maturity independent of each other. I can then
make incremental progress on different fronts without compromising overall
functionality that this PR has already achieved.
Futhermore, `external.go` can then use the package(s) to expose generic
APIs such as `beam.CrossLanguage` (for n:n inputs/outputs),
`beam.CrossLanguageAsSource` (for 0:1 inputs/outputs) and
`beam.CrossLanguageAsSink` (for 1:0 inputs/outputs).
In the future, when the existing API is to be deprecated, these
functions could then be renamed to their `beam.ExternalTransform`...
counterparts. The exisiting API could be renamed to
`beam.LegacyExternalTransform` and implemented using the API that exists as the
outcome of this PR (since it is already backwards compatible for the most
part).
@lostluck already pointed to a possible segregation by removing
pipeline handling code within `universal.go` into `graphx/xlang.go` to be used
in `graphx/translate.go`. I could explore similar options in alignment to the
results of this discussion.
- How is `ExternalTransform` associated to pipeline construction?
- Adding to `Graph` and/or `MultiEdge`
Currently, `graph.Opcode` for external transforms already exists (with
value as `"External"`) and is used in conjunction with `graph.Payload` by the
existing external transform API. Adding `ExternalTransform` to `Graph` will
pull in proto and grpc dependencies within `beam/core/graph` regardless of
whether the pipeline contains any cross-language transform or not. Adding
`ExternalTransform` to `MultiEdge` will need additional logic to distinguish
between an external transform using the proposed v/s current API.
One scanario in which I believe this method could be useful is if
correct pipeline construction required explicit handling of `ExternalTransform`
when each `MultiEdge` is being added during the `graphx.Marshal` call. As of
now I don't feel that is the case.
- Augmenting `Pipeline` and `Scope`
As is implemented in this PR, `ExpandedTransforms
map[string]ExternalTransform` field is added to `Pipeline`. The key is the
`MultiEdge`'s ID that represents an external transform within `Graph`. Using
this, during pipeline construction, the Go proto representing this transform is
swapped with the expanded `ExternalTransform`. All of the components associated
with the expanded `ExternalTransform` are also added to the pipeline proto.
This method does not conflict with any pipeline representation/construction
code for strictly Go pipelines and may be more suitable for the future.
An existing problem worth calling out is the necessity to pass
in the `Pipeline` reference to `beam.CrossLanguage` to add the new
`ExternalTransform` in the `ExpandedTransforms` map. A possible solution to
this would be:
1. Adding `func (p *Pipeline) RegisterExternalTransform(k
string, e *ExternalTransform)` function to `Pipeline`
2. Adding `AddMapEntry func(string, *ExternalTransform)` field
to `Scope`
3. Changing `func (p *Pipeline) Root() Scope` to ` return
Scope{scope: p.real.Root(), real: p.real, AddMapEntry:
p.RegisterExternalTransform}`
`beam.CrossLanguage` should then be able to use
`s.AddMapEntry(key, externalTransform)` instead of the current implementation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]