stankiewicz commented on PR #38280:
URL: https://github.com/apache/beam/pull/38280#issuecomment-4342144342
> > but why? There is GroupIntoBatches transform already.
>
> Hi @stankiewicz GroupIntoBatches is per key based and it batch elements
per key. Takes an input of [KV.of("key","value")....] and returns
[KV.of("key",[batch of values associated with only that key])
>
> BatchElements is a generic batching transform, that will buffer `Pcol<T>`
and return an `Pcol<[batch of T]>`, its primarily useful for ML inference like
RemoteInference transform where inputs are typically individual elements and we
need to batch elements before running inference for efficiency. If we use
GroupIntoBatches we'd need to do keying on elements before batching and its an
unnecessary step for that usecase. Plus we can dynamically determine batch size
at runtime with BatchElements.
>
> This is also a direct java port of python
[BatchElements](https://beam.apache.org/documentation/transforms/python/aggregation/batchelements/)
got it, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]