On Wed, Apr 14, 2021 at 11:07 PM Daniel Collins <[email protected]> wrote:
> Hi Gabriel, > > Write-side adapters for systems tend to be easier than read-side adapters > to implement. That being said, looking at the documentation for neptune, it > looks to me like there's no direct data load API, only a batch data load > from a file on S3 > <https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html>? > This is usable but perhaps a bit more difficult to work with. > > You could implement a write side adapter for neptune (either on your own > or as a contribution to beam) by writing a standard DoFn which, in its > ProcessElement method, buffers received records in memory, and in its > FinishBundle method, writes all collected records to a file on S3, notifies > neptune, and waits for neptune to ingest them. You can see documentation on > the DoFn API here > <https://beam.apache.org/releases/javadoc/2.28.0/org/apache/beam/sdk/transforms/DoFn.html>. > Someone else here might have more experience working with microbatch-style > APIs like this, and could have more suggestions. > In fact, our BigQueryIO connector has a mode of operation that does batch loads from files on GCS: https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java The connector overall is large and complex, because it is old and mature. But it may be helpful as a point of reference. Kenn > A read-side API would likely be only a minimally higher lift. This could > be done in a simple loading step (Create with a single element followed by > MapElements), although much of the complexity likely lies around how to > provide the necessary properties to the cluster construction > <https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html> > on > the beam worker task, and how to define the query the user would need to > execute. I'd also wonder if this could be done in an engine-agnostic way, > "TinkerPopIO" instead of "NeptuneIO". > > If you'd like to pursue adding such an integration, > https://beam.apache.org/contribute/ provides documentation on the > contribution process. Contributions to beam are always appreciated! > > -Daniel > > > > On Thu, Apr 15, 2021 at 12:44 AM Gabriel Levcovitz <[email protected]> > wrote: > >> Dear Beam Dev community, >> >> I'm working on a project where we have a graph database on Amazon Neptune >> (https://aws.amazon.com/neptune) and we have data coming from Google >> Cloud. >> >> So I was wondering if anyone has ever worked with a similar architecture >> and has considered developing an Amazon Neptune custom Beam I/O connector. >> Is it feasible? Is it worth it? >> >> Honestly I'm not that experienced with Apache Beam / Dataflow, so I'm not >> sure if something like that would make sense. Currently we're connecting >> Beam to AWS Kinesis and AWS S3, and from there, to Neptune. >> >> Thank you all very much in advance! >> >> Best, >> Gabriel Levcovitz >> >
