Hi Gabriel, Write-side adapters for systems tend to be easier than read-side adapters to implement. That being said, looking at the documentation for neptune, it looks to me like there's no direct data load API, only a batch data load from a file on S3 <https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html>? This is usable but perhaps a bit more difficult to work with.
You could implement a write side adapter for neptune (either on your own or as a contribution to beam) by writing a standard DoFn which, in its ProcessElement method, buffers received records in memory, and in its FinishBundle method, writes all collected records to a file on S3, notifies neptune, and waits for neptune to ingest them. You can see documentation on the DoFn API here <https://beam.apache.org/releases/javadoc/2.28.0/org/apache/beam/sdk/transforms/DoFn.html>. Someone else here might have more experience working with microbatch-style APIs like this, and could have more suggestions. A read-side API would likely be only a minimally higher lift. This could be done in a simple loading step (Create with a single element followed by MapElements), although much of the complexity likely lies around how to provide the necessary properties to the cluster construction <https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html> on the beam worker task, and how to define the query the user would need to execute. I'd also wonder if this could be done in an engine-agnostic way, "TinkerPopIO" instead of "NeptuneIO". If you'd like to pursue adding such an integration, https://beam.apache.org/contribute/ provides documentation on the contribution process. Contributions to beam are always appreciated! -Daniel On Thu, Apr 15, 2021 at 12:44 AM Gabriel Levcovitz <[email protected]> wrote: > Dear Beam Dev community, > > I'm working on a project where we have a graph database on Amazon Neptune ( > https://aws.amazon.com/neptune) and we have data coming from Google Cloud. > > So I was wondering if anyone has ever worked with a similar architecture > and has considered developing an Amazon Neptune custom Beam I/O connector. > Is it feasible? Is it worth it? > > Honestly I'm not that experienced with Apache Beam / Dataflow, so I'm not > sure if something like that would make sense. Currently we're connecting > Beam to AWS Kinesis and AWS S3, and from there, to Neptune. > > Thank you all very much in advance! > > Best, > Gabriel Levcovitz >
