Hi everyone, Apologies for the oversight in my previous email—I sent it before filling in the PR placeholder.
Here is the link to the Pull Request: https://github.com/apache/beam/pull/37411 Looking forward to your feedback. Best regards, Atharva Ralegankar On Sun, Jan 25, 2026 at 2:13 AM Atharva Ralegankar <[email protected]> wrote: > > Hi Beam Community, > > I recently opened a Pull Request to implement parallel reading support > in SparkReceiverIO (Java SDK). > > Currently, SparkReceiverIO is limited to a single worker because it > initiates reading with a single `Impulse`, creating a bottleneck for > high-throughput scenarios. > > I have submitted a fix that allows users to configure > [withNumReaders(int)](/beam/sdks/java/io/sparkreceiver/3/src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java:169:4-177:5), > which distributes reading tasks across multiple workers using a > `Create.of(shards) + Reshuffle` pattern. > > Key details: > - PR: https://github.com/apache/beam/pull/[YOUR_PR_NUMBER] > - Issue: https://github.com/apache/beam/issues/37410 > - Impact: Enables horizontal scalability for SparkReceiverIO while > maintaining strict backward compatibility. > > I would appreciate any feedback or review on this change. > > Thanks, > Atharva Ralegankar > https://www.linkedin.com/in/atharvaralegankar/
