Hi,
if you take a look on existing IO, most of them doesn't use the Sink API: they
implement a Sink using a DoFn.
I think Algolia would be the same for the Write. What do you think about
updating the index when we finalize a bundle ?
NB: what's the Algolia "client/API" license ? Just to double check that it can
be part of Beam.
Regards
JB
On 11/17/2017 02:28 AM, Chet Aldrich wrote:
Hello all,
I’m in the process of implementing a way to write data using a PTransform to Algolia
(https://www.algolia.com/ <https://www.algolia.com/>). However, in the process
of doing so I’ve run into a bit of a snag, and was curious if someone here would be
able to help me figure this out.
Building a DoFn that can accomplish this is straightforward, since it
essentially just involves creating a bundle of values and then flushing the
batch out to Algolia using their API client as needed.
However, I’d like to perform the changes to the index atomically, that is, to either
write all of the data or none of the data in the event of a pipeline failure. This
can be done in Algolia by moving a temporary index on top of an existing one, like
they do here:
https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/
<https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/>
This is where it gets a bit more tricky. I noted that there exists a @Teardown
annotation that allows one to do something like close the client when the DoFn
is complete on a given machine, but it doesn’t quite do what I want.
In theory, I’d like to write to a temporary index, and then when the transform
has been performed on all elements, I then move the index over, completing the
operation.
I previous implemented this functionality using the Beam Python SDK using the Sink
class described here: https://beam.apache.org/documentation/sdks/python-custom-io/
<https://beam.apache.org/documentation/sdks/python-custom-io/>
I’m making the transition to the Java SDK because of the built in JDBC I/O transform.
However, I’m finding that this Sink API for java is proving elusive, and digging
around hasn’t proved fruitful. Specifically, I was looking at this page and it seems
like it was directing me to ask here if I’m not sure whether the functionality I
desire can be implemented with a DoFn:
https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api
<https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api>
Is there something that can do something similar to what I just described? If
there’s something I just missed while digging through the DoFn documentation
that’d be great, but I didn’t see anything.
Best,
Chet
--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com