Hi everyone,

Apologies for the oversight in my previous email—I sent it before
filling in the PR placeholder.

Here is the link to the Pull Request: https://github.com/apache/beam/pull/37411

Looking forward to your feedback.

Best regards, Atharva Ralegankar

On Sun, Jan 25, 2026 at 2:13 AM Atharva Ralegankar
<[email protected]> wrote:
>
> Hi Beam Community,
>
> I recently opened a Pull Request to implement parallel reading support
> in SparkReceiverIO (Java SDK).
>
> Currently, SparkReceiverIO is limited to a single worker because it
> initiates reading with a single `Impulse`, creating a bottleneck for
> high-throughput scenarios.
>
> I have submitted a fix that allows users to configure
> [withNumReaders(int)](/beam/sdks/java/io/sparkreceiver/3/src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java:169:4-177:5),
> which distributes reading tasks across multiple workers using a
> `Create.of(shards) + Reshuffle` pattern.
>
> Key details:
> - PR: https://github.com/apache/beam/pull/[YOUR_PR_NUMBER]
> - Issue: https://github.com/apache/beam/issues/37410
> - Impact: Enables horizontal scalability for SparkReceiverIO while
> maintaining strict backward compatibility.
>
> I would appreciate any feedback or review on this change.
>
> Thanks,
> Atharva Ralegankar
> https://www.linkedin.com/in/atharvaralegankar/

Reply via email to