+1 - I think this is a good idea and had been considering something similar. I'm happy to help with reviews here, feel free to tag me (my GitHub handle is damccorm).
Thanks, Danny On Mon, Jan 26, 2026 at 12:00 PM Robert Bradshaw via dev < [email protected]> wrote: > +1, a weighted BatchElements would help this case a lot. > > On Sun, Jan 25, 2026 at 1:23 AM Elia LIU <[email protected]> wrote: > >> Dear Beam Community, >> >> My name is Elia, and I am a final-year student interested in contributing >> to Apache Beam's AI/ML infrastructure for GSoC 2026. >> >> I have been exploring RunInference for variable-length workloads, >> specifically within NLP and LLMs. I noticed that the current batching >> strategy in BatchElements is primarily count-based, which can lead to >> inefficient padding, compute waste on GPU cycles, and unpredictable memory >> usage (OOMs) when processing variable-length sequences. >> >> I propose introducing Content-Aware Batching (or Token-Based Batching) to >> the ML transform. This would allow batching based on a computational cost >> metric, such as total tokens, rather than element count. I intend to >> integrate this with dynamic padding in ModelHandler. >> >> I have opened a Feature Request with a conceptual API design for further >> context here: [Feature Request]: RunInference: Content-Aware Dynamic >> Batching for NLP/LLM Workloads · Issue #37414 · apache/beam >> <https://github.com/apache/beam/issues/37414> >> >> I am planning to draft a design document for this feature and would >> appreciate any feedback on this approach or information regarding existing >> efforts in this direction. >> >> Best regards, >> >> Elia >> >
