Re: [Question] Amazon Neptune I/O connector

Kenneth Knowles Thu, 15 Apr 2021 10:32:07 -0700

On Wed, Apr 14, 2021 at 11:07 PM Daniel Collins <[email protected]>
wrote:


> Hi Gabriel,
>
> Write-side adapters for systems tend to be easier than read-side adapters
> to implement. That being said, looking at the documentation for neptune, it
> looks to me like there's no direct data load API, only a batch data load
> from a file on S3
> <https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load-data.html>?
> This is usable but perhaps a bit more difficult to work with.
>
> You could implement a write side adapter for neptune (either on your own
> or as a contribution to beam) by writing a standard DoFn which, in its
> ProcessElement method, buffers received records in memory, and in its
> FinishBundle method, writes all collected records to a file on S3, notifies
> neptune, and waits for neptune to ingest them. You can see documentation on
> the DoFn API here
> <https://beam.apache.org/releases/javadoc/2.28.0/org/apache/beam/sdk/transforms/DoFn.html>.
> Someone else here might have more experience working with microbatch-style
> APIs like this, and could have more suggestions.
>

In fact, our BigQueryIO connector has a mode of operation that does batch
loads from files on GCS:
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java

The connector overall is large and complex, because it is old and mature.
But it may be helpful as a point of reference.

Kenn


> A read-side API would likely be only a minimally higher lift. This could
> be done in a simple loading step (Create with a single element followed by
> MapElements), although much of the complexity likely lies around how to
> provide the necessary properties to the cluster construction
> <https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-java.html>
>  on
> the beam worker task, and how to define the query the user would need to
> execute. I'd also wonder if this could be done in an engine-agnostic way,
> "TinkerPopIO" instead of "NeptuneIO".
>
> If you'd like to pursue adding such an integration,
> https://beam.apache.org/contribute/ provides documentation on the
> contribution process. Contributions to beam are always appreciated!
>
> -Daniel
>
>
>
> On Thu, Apr 15, 2021 at 12:44 AM Gabriel Levcovitz <[email protected]>
> wrote:
>
>> Dear Beam Dev community,
>>
>> I'm working on a project where we have a graph database on Amazon Neptune
>> (https://aws.amazon.com/neptune) and we have data coming from Google
>> Cloud.
>>
>> So I was wondering if anyone has ever worked with a similar architecture
>> and has considered developing an Amazon Neptune custom Beam I/O connector.
>> Is it feasible? Is it worth it?
>>
>> Honestly I'm not that experienced with Apache Beam / Dataflow, so I'm not
>> sure if something like that would make sense. Currently we're connecting
>> Beam to AWS Kinesis and AWS S3, and from there, to Neptune.
>>
>> Thank you all very much in advance!
>>
>> Best,
>> Gabriel Levcovitz
>>
>

Re: [Question] Amazon Neptune I/O connector

Reply via email to