Eliaaazzz opened a new pull request, #37532:
URL: https://github.com/apache/beam/pull/37532

   Updates #37531
   
   ## Summary
   
   This PR adds an **opt-in stateless bundle-local size-aware batching path** 
for variable-length inference workloads in `RunInference`.
   
   It introduces `SortAndBatchElements` in `apache_beam/transforms/util.py`, 
which:
   
   1. Buffers elements within a bundle (`StartBundle` → `FinishBundle`)
   2. Orders elements by size (default `len(x)`, overridable via 
`element_size_fn`)
   3. Forms batches under existing constraints (`max_batch_size`, 
`max_batch_weight`)
   
   Default behavior remains unchanged unless this path is enabled.
   
   ## Motivation
   
   `BatchElements` is count-based. For heavy-tail length distributions, long 
outliers can inflate padding cost for many short elements in the same batch, 
increasing tail latency and reducing effective throughput.
   
   This PR provides a stateless (bundle-local, no shuffle) way to improve batch 
composition under variable-length inputs.
   
   ## Mechanism clarification
   
   A strict-control ablation is included to isolate effects:
   
   * **Fixed boundaries + intra-batch sorting only**: negligible gain on Pareto
   * **Size-aware ordering + `max_batch_weight`**: significant gain
   
   In this workload, gains are primarily consistent with **boundary changes 
under weight constraints after size-aware ordering**, rather than intra-batch 
reordering alone.
   
   ## Benchmark methodology
   
   Script: `apache_beam/transforms/sort_and_batch_benchmark.py`
   
   * N=20 trials (3 warmup trials excluded)
   * Same fixed corpus and seed for A/B
   * Metrics: padding ratio, throughput (Ktok/s), E2E latency, batch p95/p99
   * Invariants checked: element/token conservation
   
   ## Pareto (heavy-tail) results
   
   Configuration:
   
   * N=10,000 elements
   * `max_batch_size=32`, `max_batch_weight=2000`
   * Input stats: mean=4.2, median=1, std=18.5, max=500
   
   Baseline → Stateless:
   
   * Padding ratio: `13.00x → 3.19x` (**↓75.5%**)
   * Throughput median: `2.2 → 7.3 Ktok/s` (**↑230.4%**)
   * E2E latency p95: `18924.3 → 5728.6 ms` (**↓69.7%**)
   * Batch latency p95: `283.3 → 64.4 ms` (**↓77.3%**)
   * Batch latency p99: `1011.9 → 632.5 ms` (**↓37.5%**)
   * Batch count: `313 → 321` (+3%)
   * Invariants: elements/tokens matched
   
   ## Scope
   
   Included in this PR:
   
   * Stateless, bundle-local implementation (`SortAndBatchElements`)
   * Unit tests
   * Benchmark + strict-control ablation
   
   Not included in this PR:
   
   * Stateful/global keying strategy
   * Cross-worker/global reordering
   * Auto-tuning or public weighting knobs
   
   ## Files changed
   
   * `apache_beam/transforms/util.py`
   * `apache_beam/transforms/util_test.py`
   * `apache_beam/transforms/sort_and_batch_benchmark.py`
   
   ## Notes
   
   Claims in this PR are scoped to the Pareto heavy-tail setup used above. 
Broader-distribution conclusions and stateful/global strategy are follow-up 
work.
   
   <img width="823" height="341" alt="image" 
src="https://github.com/user-attachments/assets/97b71315-34f1-412b-8e3b-971af0ac0f5e";
 />
   <img width="956" height="1023" alt="image" 
src="https://github.com/user-attachments/assets/f7e2c445-f4a1-448a-a96d-83f9d954c2ac";
 />
   
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
    - [x] Mention the appropriate issue in your description (for example: 
`addresses #123`), if applicable. This will automatically add a link to the 
pull request in the issue. If you would like the issue to automatically close 
on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to