Hey Beam-dev... I recently had an interaction with a customer that wanted to run a read-update-write transform on a Cloud Spanner DB inside a streaming Beam pipeline. I suggested writing their own DoFn, and pointed them at some of the various pitfalls they need to avoid - (those at least that have been found and fixed in the Beam SpannerIO.Write transform!)
This is not the first time I have had this request, and I was thinking about how to introduce a generic transactional RW Spanner writer: The user would supply a serializable function that takes the input element and performs the read-update-write, while the transform wraps this function in the code required to handle the Spanner connection and transform, potentially adding batching -- running multiple transactions at once. Would this be something that the community could find useful? Should I productionize the PoC I have and submit a PR? In one sense it is against the 'repeatable <https://beam.apache.org/documentation/programming-guide/#user-code-idempotence>' recommendation of a DoFn (for example, a transaction that increments a DB counter would not be idempotent), but in another sense, it makes certain actions more reliable (eg processing bank account transfers). All opinions welcome. -- <https://cloud.google.com> * • **Niel Markwick* * • *Cloud Solutions Architect <https://cloud.google.com/docs/tutorials> * • *Google Belgium * • *[email protected] * • *+32 2 894 6771 Google Belgium NV/SA, Steenweg op Etterbeek 180, 1040 Brussel, Belgie. RPR: 0878.065.378 If you have received this communication by mistake, please don't forward it to anyone else (it may contain confidential or privileged information), please erase all copies of it, including all attachments, and please let the sender know it went to the wrong person. Thanks
