Hello all, 

I’m in the process of implementing a way to write data using a PTransform to 
Algolia (https://www.algolia.com/ <https://www.algolia.com/>). However, in the 
process of doing so I’ve run into a bit of a snag, and was curious if someone 
here would be able to help me figure this out. 

Building a DoFn that can accomplish this is straightforward, since it 
essentially just involves creating a bundle of values and then flushing the 
batch out to Algolia using their API client as needed. 

However, I’d like to perform the changes to the index atomically, that is, to 
either write all of the data or none of the data in the event of a pipeline 
failure. This can be done in Algolia by moving a temporary index on top of an 
existing one, like they do here: 
https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/
 
<https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/>

This is where it gets a bit more tricky. I noted that there exists a @Teardown 
annotation that allows one to do something like close the client when the DoFn 
is complete on a given machine, but it doesn’t quite do what I want. 

In theory, I’d like to write to a temporary index, and then when the transform 
has been performed on all elements, I then move the index over, completing the 
operation.

I previous implemented this functionality using the Beam Python SDK using the 
Sink class described here: 
https://beam.apache.org/documentation/sdks/python-custom-io/ 
<https://beam.apache.org/documentation/sdks/python-custom-io/>

I’m making the transition to the Java SDK because of the built in JDBC I/O 
transform. However, I’m finding that this Sink API for java is proving elusive, 
and digging around hasn’t proved fruitful. Specifically, I was looking at this 
page and it seems like it was directing me to ask here if I’m not sure whether 
the functionality I desire can be implemented with a DoFn: 
https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api
 
<https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api>
       

Is there something that can do something similar to what I just described? If 
there’s something I just missed while digging through the DoFn documentation 
that’d be great, but I didn’t see anything. 

Best, 

Chet 



Reply via email to