lostluck commented on issue #23893: URL: https://github.com/apache/beam/issues/23893#issuecomment-1789353630
I've settled on an approach that works for both PTransform metadata (Annotations, DisplayData) and for Environment metadata (ResourceHints, DisplayData, Dependencies). However, while plumbing environments isn't too bad, we need to do a lot of changes to determining the "base environment". Pipeline construction presently "back updates" the default environment after making everything a proto, by searching for the known environment id. That way we can defer cross compiling to afterwards, if necessary. But this doesn't work if the Go pipeline itself is generating multiple environments, that all use the same binary. The way I want to fix all this is to *not* do back updates after the fact, and instead pass the base Environment properly filled in ahead of marshalling. This avoids a bunch of wacky logic we have trying to defer doing things correctly until afterwards. We can't simply extend the existing GoWorkerRole scanning mechanism because in the future, we may have a Go Expansion Service which would naturally use that for it's own environments, which would likely not be the same binary, so we can't scan everything and make the substitutions. We might be able to add a "dummy" role that we're replacing, but that just feels like we're adding a hack to continue to support a hack. I'd rather clean it up. Some of this will end up being breaking changes around the runnerlib (Execute will no longer also do the cross compile), but we only have two runnerlib callers (universal and dataflow), and it's very unlikely that someone is silently depending on this API, as the correct approach is sending jobs in through the "universal" package, instead. But that's a discussion that can happen on *that* PR, not the PTransform one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
