jbsabbagh commented on code in PR #32082:
URL: https://github.com/apache/beam/pull/32082#discussion_r1725290346


##########
sdks/python/apache_beam/transforms/util.py:
##########
@@ -802,6 +802,16 @@ class BatchElements(PTransform):
   corresponding to its contents. Each batch is emitted with a timestamp at
   the end of their window.
 
+  When the max_batch_duration_secs arg is provided, a stateful implementation
+  of BatchElements is used to batch elements across bundles. This is most
+  impactful in streaming applications where many bundles only contain one
+  element. Larger max_batch_duration_secs values can reduce the throughput of
+  the transform, while smaller values will improve the throughput but make it
+  more likely that batches are smaller than the target batch size. 
+
+  For more information on tuning parameters to this transform, see
+  https://beam.apache.org/documentation/patterns/batch-elements

Review Comment:
   (Totally optional) 
   
   I think it might be useful to gently guide the user toward low values. I 
think folks will instinctively reason about this value in seconds and be 
totally surprised at how much it slows down the pipeline.
   
   ```suggestion
     When the max_batch_duration_secs arg is provided, a stateful implementation
     of BatchElements is used to batch elements across bundles. This is most
     impactful in streaming applications where many bundles only contain one
     element. Larger max_batch_duration_secs values can reduce the throughput of
     the transform, while smaller values will improve the throughput but make it
     more likely that batches are smaller than the target batch size. 
   
     As a general recommendation, start with low values (e.g. 0.005 aka 5ms) 
and increase as needed to get the 
     desired tradeoff between target batch size and latency or throughput. 
   
     For more information on tuning parameters to this transform, see
     https://beam.apache.org/documentation/patterns/batch-elements
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to