Re: [Discuss graph source/sink design proposal]

Saikat Kanjilal Mon, 27 Jun 2016 20:46:55 -0700

Exactly right, I'm proposing we create a graph sink for flume while keeping the 
flume core intact.  The sink/source will ingest or write data to or from a 
graph database.  In reading from a graph db we need a mechanism to stream data 
from the graph store into flume.



Thoughts ?

Sent from my iPhone

> On Jun 27, 2016, at 6:47 PM, Mike Percy <[email protected]> wrote:
> 
> Saikat:
> 
> The design is pretty high level but it's unclear to me which parts are
> changes to Flume core and which parts are built into a source or sink.
> 
> I would also like to understand how you think this will change Flume core.
> 
> If this is simply a proposal for a sink that can write to a graph database,
> without changing Flume APIs, then there is less to debate here.
> 
> Thanks,
> Mike
> 
> On Mon, Jun 20, 2016 at 12:35 PM, Saikat Kanjilal <[email protected]>
> wrote:
> 
>> I'm only thinking about neo4j at the moment, yes I am aware of datastax
>> snatching up titan.  Thoughts on next steps, should I forge ahead and when
>> I get the code compiling send a sample pull request.
>> 
>>> From: [email protected]
>>> Date: Mon, 20 Jun 2016 22:29:32 +0300
>>> Subject: Re: [Discuss graph source/sink design proposal]
>>> To: [email protected]
>>> 
>>> Hi Saikat,
>>> I think that a neo4j sink is a good idea. Re/ a titan sink, I think you
>>> should not bother implementing that (see
>> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
>>> and http://stackoverflow.com/a/28596112).
>>> 
>>> On Mon, Jun 20, 2016 at 10:21 PM, Saikat Kanjilal <[email protected]>
>>> wrote:
>>> 
>>>> Lior et al,Any comments on my proposal/design, would love to begin the
>>>> coding effort and have this ready for the 1.8 release.Thanks
>>>> 
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: RE: [Discuss graph source/sink design proposal]
>>>> Date: Mon, 13 Jun 2016 12:38:11 -0700
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Hari/MikeP,I've has this proposal open for many months now, is there
>> any
>>>> way you guys can take a look at the jira and design proposal and
>> provide
>>>> feedback.  Thanks
>>>> 
>>>>> From: [email protected]
>>>>> Date: Mon, 13 Jun 2016 22:09:58 +0300
>>>>> Subject: Re: [Discuss graph source/sink design proposal]
>>>>> To: [email protected]
>>>>> 
>>>>> Got it.
>>>>> 
>>>>> On Mon, Jun 13, 2016 at 10:05 PM, Saikat Kanjilal <
>> [email protected]>
>>>>> wrote:
>>>>> 
>>>>>> That's a responsibility of the graph db not flume, flume is
>> responsible
>>>>>> for delivering the events and has no understanding of connectivity
>> of
>>>> the
>>>>>> data.  The goal in using flume is to connect incoming data that is
>>>>>> heterogeneous and transform that data before dumping it into the
>> graph
>>>> db.
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On Jun 13, 2016, at 11:09 AM, Lior Zeno <[email protected]>
>> wrote:
>>>>>>> 
>>>>>>> I got this part. How events are linked together? Do you expect an
>>>>>> adjacency
>>>>>>> list incorporated in the header?
>>>>>>> 
>>>>>>> On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal <
>>>> [email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> The use case is a flume developer wanting to connect data coming
>>>> into
>>>>>> and
>>>>>>>> out of flume sinks/sources to a graph database
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On Jun 13, 2016, at 10:55 AM, Lior Zeno <[email protected]>
>>>> wrote:
>>>>>>>>> 
>>>>>>>>> I'm not sure that I follow here. Can you please give a detailed
>>>>>> use-case?
>>>>>>>>> 
>>>>>>>>>> On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno <
>> [email protected]>
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Thanks. I'll review this and share my comments later on today.
>>>>>>>>>>> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" <
>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Motivation/Design: The graph/sink source plugin will be used
>> to
>>>>>>>>>>> custom transformations to connected data and dynamically
>> apply
>>>> these
>>>>>>>>>>> transformations to send data to any sync, an example of a
>> set of
>>>>>>>>>>> destination sinks include elasticsearch/relational
>>>> databases/spark
>>>>>> rdd
>>>>>>>>>>> etc.   Note that this plugin will serve as a source and a
>> sink
>>>>>>>> depending
>>>>>>>>>>> on the configurations.  For v1 I am targeting that we plug
>> into
>>>> neo4j
>>>>>>>>>>> database using the neo4j-jdbc interface (
>>>>>>>>>>> https://github.com/larusba/neo4j-jdbc)
>>>>>>>>>>> to build http payloads to talk to neo4j.  Once our neo4j
>>>> interface
>>>>>> will
>>>>>>>>>>> allow us to build generic interfaces and plug in any graph
>> store
>>>> in
>>>>>> the
>>>>>>>>>>> future.
>>>>>>>>>>> The
>>>>>>>>>>> design will consist of a hybrid piece of infrastructure
>> serving
>>>> both
>>>>>> as
>>>>>>>>>>> a source and a sink connected to the current flume
>> infrastructure
>>>>>>>>>>> (since all the current sinks and sources are living in their
>> own
>>>>>>>>>>> directories I would suggest this live somewhere else in the
>> flume
>>>>>>>>>>> directory structure.  Listed below is some classes I have
>>>> partially
>>>>>>>>>>> configured to kick off this
>>>>>>>>>>> discussion
>>>>>>>>>>> NeoRestClient
>>>>>>>>>>> Roles and Responsibilities: Interface to neo4j, unpack and
>> pack
>>>> data
>>>>>>>>>>> structures to perform CRUD operation on a local or remote
>> noe4j
>>>>>>>> instance
>>>>>>>>>>> APIS:
>>>>>>>>>>> //inputs flume event
>>>>>>>>>>> //outputs flume data structure identifying success metrics
>>>> around the
>>>>>>>>>>> operation
>>>>>>>>>>> //description: transform the flume event into a graph node
>>>>>>>>>>> insertNode(NeoNode nodeToInsert)
>>>>>>>>>>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra)
>>>>>>>>>>> deleteNode(NeoNode nodeToDelete)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Note that I would also like to offer up the chance to present
>>>> cipher
>>>>>>>>>>> queries (http://neo4j.com/developer/cypher-query-language/)
>> to
>>>> the
>>>>>>>>>>> source/sink infrastructure
>>>>>>>>>>> 
>>>>>>>>>>> Neo4jDynamicSerializer
>>>>>>>>>>> Roles and responsibilities: serialize flume headers and body
>> and
>>>> use
>>>>>>>> the
>>>>>>>>>>> Neo4jRestClient to perform crud on neo4j
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Both the source and the sink infrastructure will use the same
>>>>>>>>>>> infrastructure above.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> That should be enough of a first cut for design/motivation
>> and
>>>> JIRA
>>>>>>>>>>> details, would love to kick off the discussion at this point.
>>>>>>>>>>> Thanks in advance
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> From: [email protected]
>>>>>>>>>>>> To: [email protected]
>>>>>>>>>>>> Subject: [Discuss graph source/sink design proposal]
>>>>>>>>>>>> Date: Sun, 12 Jun 2016 15:01:14 -0700
>>>>>>>>>>>> 
>>>>>>>>>>>> Jira with details here:
>>>>>>>>>>> https://issues.apache.org/jira/browse/FLUME-2035
>>>>>>>>>>>> 
>>>>>>>>>>>> Please respond with your questions.
>> 
>>

Re: [Discuss graph source/sink design proposal]

Reply via email to