Hey JB, I wasn’t really thinking about open-sourcing the I/O transform with Beam in this case because of the API’s proprietary nature, but then again I suppose that BigQuery and other Google services are similarly proprietary and have transforms included with Beam.
If you feel like it’d be a worthwhile addition, I can ask some people that work there about the licensing and see if this is something that could be done. Chet > On Nov 16, 2017, at 9:08 PM, Jean-Baptiste Onofré <[email protected]> wrote: > > Hi, > > if you take a look on existing IO, most of them doesn't use the Sink API: > they implement a Sink using a DoFn. > > I think Algolia would be the same for the Write. What do you think about > updating the index when we finalize a bundle ? > > NB: what's the Algolia "client/API" license ? Just to double check that it > can be part of Beam. > > Regards > JB > > On 11/17/2017 02:28 AM, Chet Aldrich wrote: >> Hello all, >> I’m in the process of implementing a way to write data using a PTransform to >> Algolia (https://www.algolia.com/ <https://www.algolia.com/>). However, in >> the process of doing so I’ve run into a bit of a snag, and was curious if >> someone here would be able to help me figure this out. >> Building a DoFn that can accomplish this is straightforward, since it >> essentially just involves creating a bundle of values and then flushing the >> batch out to Algolia using their API client as needed. >> However, I’d like to perform the changes to the index atomically, that is, >> to either write all of the data or none of the data in the event of a >> pipeline failure. This can be done in Algolia by moving a temporary index on >> top of an existing one, like they do here: >> https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/ >> >> <https://www.algolia.com/doc/tutorials/indexing/synchronization/atomic-reindexing/> >> This is where it gets a bit more tricky. I noted that there exists a >> @Teardown annotation that allows one to do something like close the client >> when the DoFn is complete on a given machine, but it doesn’t quite do what I >> want. >> In theory, I’d like to write to a temporary index, and then when the >> transform has been performed on all elements, I then move the index over, >> completing the operation. >> I previous implemented this functionality using the Beam Python SDK using >> the Sink class described here: >> https://beam.apache.org/documentation/sdks/python-custom-io/ >> <https://beam.apache.org/documentation/sdks/python-custom-io/> >> I’m making the transition to the Java SDK because of the built in JDBC I/O >> transform. However, I’m finding that this Sink API for java is proving >> elusive, and digging around hasn’t proved fruitful. Specifically, I was >> looking at this page and it seems like it was directing me to ask here if >> I’m not sure whether the functionality I desire can be implemented with a >> DoFn: >> https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api >> >> <https://beam.apache.org/documentation/io/authoring-overview/#when-to-implement-using-the-sink-api> >> >> Is there something that can do something similar to what I just described? >> If there’s something I just missed while digging through the DoFn >> documentation that’d be great, but I didn’t see anything. >> Best, >> Chet > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com
