Hello all, I’m in the process of implementing a way to write data using a PTransform to Algolia (https://www.algolia.com/ <https://www.algolia.com/>). However, in the process of doing so I’ve run into a bit of a snag, and was curious if someone here would be able to help me figure this out.
Building a DoFn that can accomplish this is straightforward, since it essentially just involves creating a bundle of values and then flushing the batch out to Algolia using their API client as needed. However, I’d like to perform the changes to the index atomically, that is, to either write all of the data or none of the data in the event of a pipeline failure. This can be done in Algolia by moving a temporary index on top of an existing one, like they do here: https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/ <https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/> This is where it gets a bit more tricky. I noted that there exists a @Teardown annotation that allows one to do something like close the client when the DoFn is complete on a given machine, but it doesn’t quite do what I want. In theory, I’d like to write to a temporary index, and then when the transform has been performed on all elements, I then move the index over, completing the operation. I previous implemented this functionality using the Beam Python SDK using the Sink class described here: https://beam.apache.org/documentation/sdks/python-custom-io/ <https://beam.apache.org/documentation/sdks/python-custom-io/> I’m making the transition to the Java SDK because of the built in JDBC I/O transform. However, I’m finding that this Sink API for java is proving elusive, and digging around hasn’t proved fruitful. Specifically, I was looking at this page and it seems like it was directing me to ask here if I’m not sure whether the functionality I desire can be implemented with a DoFn: https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api <https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api> Is there something that can do something similar to what I just described? If there’s something I just missed while digging through the DoFn documentation that’d be great, but I didn’t see anything. Best, Chet
