Hi Beam Community,

I recently opened a Pull Request to implement parallel reading support
in SparkReceiverIO (Java SDK).

Currently, SparkReceiverIO is limited to a single worker because it
initiates reading with a single `Impulse`, creating a bottleneck for
high-throughput scenarios.

I have submitted a fix that allows users to configure
[withNumReaders(int)](/beam/sdks/java/io/sparkreceiver/3/src/main/java/org/apache/beam/sdk/io/sparkreceiver/SparkReceiverIO.java:169:4-177:5),
which distributes reading tasks across multiple workers using a
`Create.of(shards) + Reshuffle` pattern.

Key details:
- PR: https://github.com/apache/beam/pull/[YOUR_PR_NUMBER]
- Issue: https://github.com/apache/beam/issues/37410
- Impact: Enables horizontal scalability for SparkReceiverIO while
maintaining strict backward compatibility.

I would appreciate any feedback or review on this change.

Thanks,
Atharva Ralegankar
https://www.linkedin.com/in/atharvaralegankar/

Reply via email to