Hi,

if you take a look on existing IO, most of them doesn't use the Sink API: they implement a Sink using a DoFn.

I think Algolia would be the same for the Write. What do you think about updating the index when we finalize a bundle ?

NB: what's the Algolia "client/API" license ? Just to double check that it can be part of Beam.

Regards
JB

On 11/17/2017 02:28 AM, Chet Aldrich wrote:
Hello all,

I’m in the process of implementing a way to write data using a PTransform to Algolia 
(https://www.algolia.com/ <https://www.algolia.com/>). However, in the process 
of doing so I’ve run into a bit of a snag, and was curious if someone here would be 
able to help me figure this out.

Building a DoFn that can accomplish this is straightforward, since it 
essentially just involves creating a bundle of values and then flushing the 
batch out to Algolia using their API client as needed.

However, I’d like to perform the changes to the index atomically, that is, to either 
write all of the data or none of the data in the event of a pipeline failure. This 
can be done in Algolia by moving a temporary index on top of an existing one, like 
they do here: 
https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/ 
<https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/>

This is where it gets a bit more tricky. I noted that there exists a @Teardown 
annotation that allows one to do something like close the client when the DoFn 
is complete on a given machine, but it doesn’t quite do what I want.

In theory, I’d like to write to a temporary index, and then when the transform 
has been performed on all elements, I then move the index over, completing the 
operation.

I previous implemented this functionality using the Beam Python SDK using the Sink 
class described here: https://beam.apache.org/documentation/sdks/python-custom-io/ 
<https://beam.apache.org/documentation/sdks/python-custom-io/>

I’m making the transition to the Java SDK because of the built in JDBC I/O transform. 
However, I’m finding that this Sink API for java is proving elusive, and digging 
around hasn’t proved fruitful. Specifically, I was looking at this page and it seems 
like it was directing me to ask here if I’m not sure whether the functionality I 
desire can be implemented with a DoFn: 
https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api
 
<https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api>
 

Is there something that can do something similar to what I just described? If 
there’s something I just missed while digging through the DoFn documentation 
that’d be great, but I didn’t see anything.

Best,

Chet





--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to