TheNeuralBit opened a new pull request #17175:
URL: https://github.com/apache/beam/pull/17175


   This PR adds initial support for https://s.apache.org/batched-dofns to the 
Python SDK. The only entirely new module is `apache_beam.typehints.batch` which 
adds support for registering `BatchConverter` implementations to convert 
between element and batch types. The rest of the PR includes changes to `DoFn` 
(add a `process_batch` method which can be overriden), as well as 
"construction-time" changes to `Pipeline`, `PCollection`, `ParDo` and friends 
to support checking types and resolving `BatchConverter` instances. There are 
also wide ranging changes to `apache_beam.runners.worker` to recognize these 
constructs and create/explode batches between Operations as necessary.
   
   Note this is still a WIP - there are a number of TODOs in the current 
implementation that I am continuing to address as part of this PR. I also need 
to add more test coverage (currently there's just one end-to-end test in 
`FnRunnerTest`).
   
   There are also some planned larger improvements to this framework that will 
be left for future PRs:
   - This implementation assumes `process_batch` always yields batches and 
`process` always yields elements. We will add `@DoFn.yields_batches` and 
`@DoFn.yields_elements` decorators to override this default behavior.
   - "batch" versions of `DoFn.*Param` as described 
[here](https://s.apache.org/batched-dofns#heading=h.th8he6p84ja),
   
   GitHub Actions Tests Status (on master branch)
   
------------------------------------------------------------------------------------------------
   [![Build python source distribution and 
wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python 
tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java 
tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more 
information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to