Re: [Question] Amazon Neptune I/O connector

Daniel Collins Wed, 14 Apr 2021 23:06:59 -0700

Hi Gabriel,

Write-side adapters for systems tend to be easier than read-side adapters
to implement. That being said, looking at the documentation for neptune, it
looks to me like there's no direct data load API, only a batch data load
from a file on S3
<https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html>?
This is usable but perhaps a bit more difficult to work with.

You could implement a write side adapter for neptune (either on your own or
as a contribution to beam) by writing a standard DoFn which, in its
ProcessElement method, buffers received records in memory, and in its
FinishBundle method, writes all collected records to a file on S3, notifies
neptune, and waits for neptune to ingest them. You can see documentation on
the DoFn API here
<https://beam.apache.org/releases/javadoc/2.28.0/org/apache/beam/sdk/transforms/DoFn.html>.
Someone else here might have more experience working with microbatch-style
APIs like this, and could have more suggestions.

A read-side API would likely be only a minimally higher lift. This could be
done in a simple loading step (Create with a single element followed by
MapElements), although much of the complexity likely lies around how to
provide the necessary properties to the cluster construction
<https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html>
on
the beam worker task, and how to define the query the user would need to
execute. I'd also wonder if this could be done in an engine-agnostic way,
"TinkerPopIO" instead of "NeptuneIO".

If you'd like to pursue adding such an integration,
https://beam.apache.org/contribute/ provides documentation on the
contribution process. Contributions to beam are always appreciated!

-Daniel

On Thu, Apr 15, 2021 at 12:44 AM Gabriel Levcovitz <[email protected]>
wrote:

> Dear Beam Dev community,
>
> I'm working on a project where we have a graph database on Amazon Neptune (
> https://aws.amazon.com/neptune) and we have data coming from Google Cloud.
>
> So I was wondering if anyone has ever worked with a similar architecture
> and has considered developing an Amazon Neptune custom Beam I/O connector.
> Is it feasible? Is it worth it?
>
> Honestly I'm not that experienced with Apache Beam / Dataflow, so I'm not
> sure if something like that would make sense. Currently we're connecting
> Beam to AWS Kinesis and AWS S3, and from there, to Neptune.
>
> Thank you all very much in advance!
>
> Best,
> Gabriel Levcovitz
>

Re: [Question] Amazon Neptune I/O connector

Reply via email to