lostluck commented on PR #25568:
URL: https://github.com/apache/beam/pull/25568#issuecomment-1438978316
Ready for another look!
> These may be my personal preferences, but I think the code would be easier
to follow if `RunPipeline`, as the main function in the internal package, was
defined in the beginning of execute.go, followed by the functions it uses
(`runEnvironment`, `executePipeline`, and so on).
>
Fair! I re-arranged a little before sending this out. That's the reverse of
my usual preference (having things defined before they're used). But I agree
that ordering is more approachable for reading how the runner works in this
case.
> It might also make sense to move `stage` from execute.go to preprocess.go,
or to its own file if you would prefer that. As it is now, `executePipeline()`
in execute.go initializes `preprocessor` and invokes
`preprocessor.preProcessGraph()`, which is defined in preprocess.go. Whereas
preprocessor.preProcessGraph() initializes stage, which is defined in
execute.go.
This one is trickier. Preprocess.go only uses a single field in stage: the
slice of transform IDs, everything else is used around this file, and is more
central to execution. So it shouldn't live in preprocess.go IMO.
Preprocess used to live in execute.go but the preprocessor, while simple
now, will eventually handle stage fusion, and become more complicated, so it's
worth splitting to it's own file for testing. A bunch of stuff should probably
moved to the pre-processor too.
But I've ended up moving it to its own file in the end, because it turns out
it's most of the content? Some of it will need to become its own "handler", for
handling messages to the FnAPI. I had clearly lost scope about how much code
was unique to building and managing stages. Ultimately, it'll be a candidate to
move to it's own package along with associated unit testing and documentation.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]