olehborysevych commented on code in PR #30435:
URL: https://github.com/apache/beam/pull/30435#discussion_r1515349817


##########
learning/prompts/documentation-lookup-nolinks/47_batching_for_external_calls.md:
##########
@@ -0,0 +1,70 @@
+Prompt:
+How to reduce payload when calling external services from my Apache Beam 
pipeline?
+
+Response:
+To reduce payload when calling external services from your Apache Beam 
pipeline, you can employ batching techniques by using the `GroupIntoBatches` 
transform. Batching involves aggregating multiple elements into a single 
payload, reducing the number of requests sent to the external service and 
minimizing overhead.
+
+Under the hood, the `GroupIntoBatches` transform utilizes state and timers to 
grant users precise control over batch size and buffering duration parameters, 
while abstracting away the implementation details. Key parameters include:
+* `maxBufferDuration`: controls the maximum waiting time for a batch to be 
emitted.
+* `batchSize`: determines the maximum number of elements in each batch. 
Elements are buffered until the specified number is reached, then emitted as a 
batch.
+* `batchSizeBytes` (Java only): limits the byte size of a single batch, 
determined by the input coder.
+* `elementByteSize` (Java only): sets the byte size of a single batch using a 
user-defined function.
+* `withShardedKey()`: enhances parallelism by distributing a single key across 
multiple threads.

Review Comment:
   withSharedKey() should be separated from other parameters since it's not a 
parameter but a function @dariabezkorovaina 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to