Ganeshsivakumar commented on PR #38280:
URL: https://github.com/apache/beam/pull/38280#issuecomment-4342037940
> but why? There is GroupIntoBatches transform already.
Hi @stankiewicz GroupIntoBatches is per key based and it batch elements per
key. Takes an input of [KV.of("key","value")....] and returns
[KV.of("key",[batch of values associated with only that key])
BatchElements is a generic batching transform, that will buffer
```Pcol<T>``` and return an ```Pcol<[batch of T]>```, its primarily useful for
ML inference like RemoteInference transform where inputs are typically
individual elements and we need to batch elements before running inference for
efficiency. If we use GroupIntoBatches we'd need to do keying on elements
before batching and its an unnecessary step for that usecase. Plus we can
dynamically determine batch size at runtime with BatchElements.
This is also a direct java port of python
[BatchElements](https://beam.apache.org/documentation/transforms/python/aggregation/batchelements/)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]