TheNeuralBit opened a new pull request #17175: URL: https://github.com/apache/beam/pull/17175
This PR adds initial support for https://s.apache.org/batched-dofns to the Python SDK. The only entirely new module is `apache_beam.typehints.batch` which adds support for registering `BatchConverter` implementations to convert between element and batch types. The rest of the PR includes changes to `DoFn` (add a `process_batch` method which can be overriden), as well as "construction-time" changes to `Pipeline`, `PCollection`, `ParDo` and friends to support checking types and resolving `BatchConverter` instances. There are also wide ranging changes to `apache_beam.runners.worker` to recognize these constructs and create/explode batches between Operations as necessary. Note this is still a WIP - there are a number of TODOs in the current implementation that I am continuing to address as part of this PR. I also need to add more test coverage (currently there's just one end-to-end test in `FnRunnerTest`). There are also some planned larger improvements to this framework that will be left for future PRs: - This implementation assumes `process_batch` always yields batches and `process` always yields elements. We will add `@DoFn.yields_batches` and `@DoFn.yields_elements` decorators to override this default behavior. - "batch" versions of `DoFn.*Param` as described [here](https://s.apache.org/batched-dofns#heading=h.th8he6p84ja), GitHub Actions Tests Status (on master branch) ------------------------------------------------------------------------------------------------ [](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule) [](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule) See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
