RE: Re: DISCUSS] Proposal: Content-Aware Dynamic Batching for RunInference

Elia LIU Mon, 26 Jan 2026 17:08:26 -0800

Thanks Robert and Danny!
@Robert - I'll definitely look into the weighted BatchElements approach! That 
sounds like the right direction for handling variable-sized inputs like tokens.
@Danny - That would be great! I'll tag you (@damccorm) once I have the PR 
ready. I'm starting the implementation now.



On 2026/01/26 17:06:25 Danny McCormick via dev wrote:
> +1 - I think this is a good idea and had been considering something
> similar. I'm happy to help with reviews here, feel free to tag me (my
> GitHub handle is damccorm).
>
> Thanks,
> Danny
>
> On Mon, Jan 26, 2026 at 12:00 PM Robert Bradshaw via dev <
> [email protected]> wrote:
>
> > +1, a weighted BatchElements would help this case a lot.
> >
> > On Sun, Jan 25, 2026 at 1:23 AM Elia LIU <[email protected]> wrote:
> >
> >> Dear Beam Community,
> >>
> >> My name is Elia, and I am a final-year student interested in contributing
> >> to Apache Beam's AI/ML infrastructure for GSoC 2026.
> >>
> >> I have been exploring RunInference for variable-length workloads,
> >> specifically within NLP and LLMs. I noticed that the current batching
> >> strategy in BatchElements is primarily count-based, which can lead to
> >> inefficient padding, compute waste on GPU cycles, and unpredictable memory
> >> usage (OOMs) when processing variable-length sequences.
> >>
> >> I propose introducing Content-Aware Batching (or Token-Based Batching) to
> >> the ML transform. This would allow batching based on a computational cost
> >> metric, such as total tokens, rather than element count. I intend to
> >> integrate this with dynamic padding in ModelHandler.
> >>
> >> I have opened a Feature Request with a conceptual API design for further
> >> context here: [Feature Request]: RunInference: Content-Aware Dynamic
> >> Batching for NLP/LLM Workloads · Issue #37414 · apache/beam
> >> <https://github.com/apache/beam/issues/37414>
> >>
> >> I am planning to draft a design document for this feature and would
> >> appreciate any feedback on this approach or information regarding existing
> >> efforts in this direction.
> >>
> >> Best regards,
> >>
> >> Elia
> >>
> >
>

RE: Re: DISCUSS] Proposal: Content-Aware Dynamic Batching for RunInference

Reply via email to