olehborysevych commented on code in PR #30435: URL: https://github.com/apache/beam/pull/30435#discussion_r1515349817
########## learning/prompts/documentation-lookup-nolinks/47_batching_for_external_calls.md: ########## @@ -0,0 +1,70 @@ +Prompt: +How to reduce payload when calling external services from my Apache Beam pipeline? + +Response: +To reduce payload when calling external services from your Apache Beam pipeline, you can employ batching techniques by using the `GroupIntoBatches` transform. Batching involves aggregating multiple elements into a single payload, reducing the number of requests sent to the external service and minimizing overhead. + +Under the hood, the `GroupIntoBatches` transform utilizes state and timers to grant users precise control over batch size and buffering duration parameters, while abstracting away the implementation details. Key parameters include: +* `maxBufferDuration`: controls the maximum waiting time for a batch to be emitted. +* `batchSize`: determines the maximum number of elements in each batch. Elements are buffered until the specified number is reached, then emitted as a batch. +* `batchSizeBytes` (Java only): limits the byte size of a single batch, determined by the input coder. +* `elementByteSize` (Java only): sets the byte size of a single batch using a user-defined function. +* `withShardedKey()`: enhances parallelism by distributing a single key across multiple threads. Review Comment: withSharedKey() should be separated from other parameters since it's not a parameter but a function @dariabezkorovaina -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
