Hi, Omkar,

Your observation is correct. Currently bundle is implemented in a per-event
basis (Code is DoFnOp.processElement,
https://docs.google.com/spreadsheets/d/1pIUQ8J658B7GPNDt5dwiJBRQyWep1n2rXlkONyA6ZMM/edit#gid=1709587251).
We are working on supporting bundles in Samza right now so in Beam we can
take advantage of it. Bundling is also critical to have better python
performance so we are trying to get it out very soon (Feb-March).

On the other hand, in java if you want to process in a batch fashion, you
can use the Beam GroupIntoBatches api (
https://beam.apache.org/releases/javadoc/2.0.0/org/apache/beam/sdk/transforms/GroupIntoBatches.html).
This will group the elements into batches and then deliver to your ParDo
afterwards. Please let us know whether this works for you.

Thanks,
Xinyu



On Fri, Jan 18, 2019 at 10:27 AM Daniel Chen <dch...@linkedin.com> wrote:

> + Xinyu
>
> On 1/18/19, 10:05 AM, "Deshpande, Omkar" <omkar_deshpa...@intuit.com>
> wrote:
>
>     Hello,
>
>     I am using Samza runner with Apache Beam. Is there any documentation
> available on how bundles are implemented in the Samza runner?
>     I have observed every Kafka record getting processed in its own
> bundle. How can I get larger bundles?
>
>      <beam.version>2.9.0 <samza.version>0.14.1
>
>     Thanks,
>     Omkar
>
>
>

Reply via email to