Hi Adalbert,

The nature of scheduling work with splittable DoFns is such that trying to
start all splits at the same time isn't really supported. In addition, the
general assumption of splitting work in Beam is that a split can be retried
in isolation from other splits, which doesn't look supported by SingleStore
parallel read.

That said, this looks really promising, so I'd be happy to get on a call to
help better understand your design, and see if we can find a solution.

John

On Thu, Aug 25, 2022 at 10:16 AM Adalbert Makarovych <
amakarovych...@singlestore.com> wrote:

> Hello,
>
> I'm working on the SingleStore IO connector and would like to discuss it
> with Beam developers.
> It would be great if the connector can use SingleStore parallel read
> <https://docs.singlestore.com/managed-service/en/query-data/query-procedures/read-query-results-in-parallel.html>.
> In the ideal case, the connector should use Single-read mode as it is
> faster than Multiple-read and consumes much less memory.
>
> One of the problems is that in Single-read mode, each reader must initiate
> its read query before any readers will receive data. Is it possible to
> somehow configure Beam to start all DoFns at the same time? Or to get the
> numbers of started DoFns at the runtime?
>
> The other problem is that Single-read allows reading data from partition
> only once, so if one reading thread failed - all others should be restarted
> to retry. Is it possible to achieve this behavior? Or to at least
> gracefully fail without additional retries?
>
> Here are the first drafts of the design documentation
> <https://docs.google.com/document/d/1WU-hkoZ93SaGXyOz_UtX0jXzIRl194hCId_IdmEV9jw/edit?usp=sharing>
> .
> I would appreciate any help with this stuff :)
>
> --
> Adalbert Makarovych
> Software Engineer at SingleStore
>
>
> <https://www.singlestore.com/customers/?utm_source=singlestore&utm_medium=email&utm_campaign=1-on-trustradius>
>

Reply via email to