Exactly right, I'm proposing we create a graph sink for flume while keeping the flume core intact. The sink/source will ingest or write data to or from a graph database. In reading from a graph db we need a mechanism to stream data from the graph store into flume.
Thoughts ? Sent from my iPhone > On Jun 27, 2016, at 6:47 PM, Mike Percy <[email protected]> wrote: > > Saikat: > > The design is pretty high level but it's unclear to me which parts are > changes to Flume core and which parts are built into a source or sink. > > I would also like to understand how you think this will change Flume core. > > If this is simply a proposal for a sink that can write to a graph database, > without changing Flume APIs, then there is less to debate here. > > Thanks, > Mike > > On Mon, Jun 20, 2016 at 12:35 PM, Saikat Kanjilal <[email protected]> > wrote: > >> I'm only thinking about neo4j at the moment, yes I am aware of datastax >> snatching up titan. Thoughts on next steps, should I forge ahead and when >> I get the code compiling send a sample pull request. >> >>> From: [email protected] >>> Date: Mon, 20 Jun 2016 22:29:32 +0300 >>> Subject: Re: [Discuss graph source/sink design proposal] >>> To: [email protected] >>> >>> Hi Saikat, >>> I think that a neo4j sink is a good idea. Re/ a titan sink, I think you >>> should not bother implementing that (see >> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/ >>> and http://stackoverflow.com/a/28596112). >>> >>> On Mon, Jun 20, 2016 at 10:21 PM, Saikat Kanjilal <[email protected]> >>> wrote: >>> >>>> Lior et al,Any comments on my proposal/design, would love to begin the >>>> coding effort and have this ready for the 1.8 release.Thanks >>>> >>>> From: [email protected] >>>> To: [email protected] >>>> Subject: RE: [Discuss graph source/sink design proposal] >>>> Date: Mon, 13 Jun 2016 12:38:11 -0700 >>>> >>>> >>>> >>>> >>>> Hari/MikeP,I've has this proposal open for many months now, is there >> any >>>> way you guys can take a look at the jira and design proposal and >> provide >>>> feedback. Thanks >>>> >>>>> From: [email protected] >>>>> Date: Mon, 13 Jun 2016 22:09:58 +0300 >>>>> Subject: Re: [Discuss graph source/sink design proposal] >>>>> To: [email protected] >>>>> >>>>> Got it. >>>>> >>>>> On Mon, Jun 13, 2016 at 10:05 PM, Saikat Kanjilal < >> [email protected]> >>>>> wrote: >>>>> >>>>>> That's a responsibility of the graph db not flume, flume is >> responsible >>>>>> for delivering the events and has no understanding of connectivity >> of >>>> the >>>>>> data. The goal in using flume is to connect incoming data that is >>>>>> heterogeneous and transform that data before dumping it into the >> graph >>>> db. >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>>> On Jun 13, 2016, at 11:09 AM, Lior Zeno <[email protected]> >> wrote: >>>>>>> >>>>>>> I got this part. How events are linked together? Do you expect an >>>>>> adjacency >>>>>>> list incorporated in the header? >>>>>>> >>>>>>> On Mon, Jun 13, 2016 at 8:59 PM, Saikat Kanjilal < >>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> The use case is a flume developer wanting to connect data coming >>>> into >>>>>> and >>>>>>>> out of flume sinks/sources to a graph database >>>>>>>> >>>>>>>> Sent from my iPhone >>>>>>>> >>>>>>>>> On Jun 13, 2016, at 10:55 AM, Lior Zeno <[email protected]> >>>> wrote: >>>>>>>>> >>>>>>>>> I'm not sure that I follow here. Can you please give a detailed >>>>>> use-case? >>>>>>>>> >>>>>>>>>> On Mon, Jun 13, 2016 at 7:20 AM, Lior Zeno < >> [email protected]> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks. I'll review this and share my comments later on today. >>>>>>>>>>> On Jun 13, 2016 2:30 AM, "Saikat Kanjilal" < >> [email protected]> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Motivation/Design: The graph/sink source plugin will be used >> to >>>>>>>>>>> custom transformations to connected data and dynamically >> apply >>>> these >>>>>>>>>>> transformations to send data to any sync, an example of a >> set of >>>>>>>>>>> destination sinks include elasticsearch/relational >>>> databases/spark >>>>>> rdd >>>>>>>>>>> etc. Note that this plugin will serve as a source and a >> sink >>>>>>>> depending >>>>>>>>>>> on the configurations. For v1 I am targeting that we plug >> into >>>> neo4j >>>>>>>>>>> database using the neo4j-jdbc interface ( >>>>>>>>>>> https://github.com/larusba/neo4j-jdbc) >>>>>>>>>>> to build http payloads to talk to neo4j. Once our neo4j >>>> interface >>>>>> will >>>>>>>>>>> allow us to build generic interfaces and plug in any graph >> store >>>> in >>>>>> the >>>>>>>>>>> future. >>>>>>>>>>> The >>>>>>>>>>> design will consist of a hybrid piece of infrastructure >> serving >>>> both >>>>>> as >>>>>>>>>>> a source and a sink connected to the current flume >> infrastructure >>>>>>>>>>> (since all the current sinks and sources are living in their >> own >>>>>>>>>>> directories I would suggest this live somewhere else in the >> flume >>>>>>>>>>> directory structure. Listed below is some classes I have >>>> partially >>>>>>>>>>> configured to kick off this >>>>>>>>>>> discussion >>>>>>>>>>> NeoRestClient >>>>>>>>>>> Roles and Responsibilities: Interface to neo4j, unpack and >> pack >>>> data >>>>>>>>>>> structures to perform CRUD operation on a local or remote >> noe4j >>>>>>>> instance >>>>>>>>>>> APIS: >>>>>>>>>>> //inputs flume event >>>>>>>>>>> //outputs flume data structure identifying success metrics >>>> around the >>>>>>>>>>> operation >>>>>>>>>>> //description: transform the flume event into a graph node >>>>>>>>>>> insertNode(NeoNode nodeToInsert) >>>>>>>>>>> searchNode(NeoNode nodeToSearch,Algorithm useAStarOrDijkstra) >>>>>>>>>>> deleteNode(NeoNode nodeToDelete) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Note that I would also like to offer up the chance to present >>>> cipher >>>>>>>>>>> queries (http://neo4j.com/developer/cypher-query-language/) >> to >>>> the >>>>>>>>>>> source/sink infrastructure >>>>>>>>>>> >>>>>>>>>>> Neo4jDynamicSerializer >>>>>>>>>>> Roles and responsibilities: serialize flume headers and body >> and >>>> use >>>>>>>> the >>>>>>>>>>> Neo4jRestClient to perform crud on neo4j >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Both the source and the sink infrastructure will use the same >>>>>>>>>>> infrastructure above. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> That should be enough of a first cut for design/motivation >> and >>>> JIRA >>>>>>>>>>> details, would love to kick off the discussion at this point. >>>>>>>>>>> Thanks in advance >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> From: [email protected] >>>>>>>>>>>> To: [email protected] >>>>>>>>>>>> Subject: [Discuss graph source/sink design proposal] >>>>>>>>>>>> Date: Sun, 12 Jun 2016 15:01:14 -0700 >>>>>>>>>>>> >>>>>>>>>>>> Jira with details here: >>>>>>>>>>> https://issues.apache.org/jira/browse/FLUME-2035 >>>>>>>>>>>> >>>>>>>>>>>> Please respond with your questions. >> >>
