stankiewicz commented on PR #38280:
URL: https://github.com/apache/beam/pull/38280#issuecomment-4342144342

   > > but why? There is GroupIntoBatches transform already.
   > 
   > Hi @stankiewicz GroupIntoBatches is per key based and it batch elements 
per key. Takes an input of [KV.of("key","value")....] and returns 
[KV.of("key",[batch of values associated with only that key])
   > 
   > BatchElements is a generic batching transform, that will buffer `Pcol<T>` 
and return an `Pcol<[batch of T]>`, its primarily useful for ML inference like 
RemoteInference transform where inputs are typically individual elements and we 
need to batch elements before running inference for efficiency. If we use 
GroupIntoBatches we'd need to do keying on elements before batching and its an 
unnecessary step for that usecase. Plus we can dynamically determine batch size 
at runtime with BatchElements.
   > 
   > This is also a direct java port of python 
[BatchElements](https://beam.apache.org/documentation/transforms/python/aggregation/batchelements/)
   
   got it, thanks! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to