Hey Beam-dev...

I recently had an interaction with a customer that wanted to run a
read-update-write transform on a Cloud Spanner DB inside a streaming Beam
pipeline. I suggested writing their own DoFn, and pointed them at some of
the various pitfalls they need to avoid - (those at least that have been
found and fixed in the Beam SpannerIO.Write transform!)

This is not the first time I have had this request, and I was thinking
about how to introduce a generic transactional RW Spanner writer: The user
would supply a serializable function that takes the input element and
performs the read-update-write, while the transform wraps this function in
the code required to handle the Spanner connection and transform,
potentially adding batching -- running multiple transactions at once.

Would this be something that the community could find useful? Should I
productionize the PoC I have and submit a PR?

In one sense it is against the 'repeatable
<https://beam.apache.org/documentation/programming-guide/#user-code-idempotence>'
recommendation of a DoFn (for example, a transaction that increments a DB
counter would not be idempotent), but in another sense, it makes certain
actions more reliable (eg processing bank account transfers).

All opinions welcome.

-- 
<https://cloud.google.com>
* •  **Niel Markwick*
* •  *Cloud Solutions Architect <https://cloud.google.com/docs/tutorials>
* •  *Google Belgium
* •  *[email protected]
* •  *+32 2 894 6771

Google Belgium NV/SA, Steenweg op Etterbeek 180, 1040 Brussel, Belgie.
RPR: 0878.065.378

If you have received this communication by mistake, please don't forward it
to anyone else (it may contain confidential or privileged information),
please erase all copies of it, including all attachments, and please let
the sender know it went to the wrong person. Thanks

Reply via email to