RE: [Discuss graph source/sink design proposal]

Saikat Kanjilal Tue, 28 Jun 2016 08:31:11 -0700

:) I'm using Kudu at work at the moment to troubleshoot some Tomcat issues,  
regarding the where to keep the source code I would say for now lets go with 
the plugin approach and revisit the "where does the code live" conversation 
later.  One thing I do want to discuss is that the plugin will act as a source 
or a sink depending on configuration, so if the plugin acts as a source we need 
a mechanism (like a daemon in syslog) to stream changes real time from a 
graphdb into flume, I was wondering if there are any past approaches around 
this that I can follow, I may need to dig into the neo4j kernel to see where we 
can inject something like this.
Thoughts on that?


> From: [email protected]
> Date: Tue, 28 Jun 2016 00:27:45 -0700
> Subject: Re: [Discuss graph source/sink design proposal]
> To: [email protected]
> 
> Hi Saikat,
> Please see my thoughts inline. This is how I think about this stuff; others
> may think about it differently.
> 
> On Mon, Jun 27, 2016 at 8:45 PM, Saikat Kanjilal <[email protected]>
> wrote:
> 
> > Exactly right, I'm proposing we create a graph sink for flume while
> > keeping the flume core intact.
> 
> 
> As you are probably aware, sources and sinks don't have to be part of the
> main Apache Flume source tree to be used with Flume. The plugins.d
> mechanism described in [1] makes building and integrating separate plugins
> into Flume an easy thing to do at deployment time.
> 
> In another project I work on, Apache Kudu (incubating), we have a Flume
> Kudu sink committed in the main source tree [2]. We may at some point
> propose to move it into the Flume source tree, but for now (for testing and
> API stability reasons) it's easier to keep it in the Kudu source tree.
> 
> Likewise, you could implement a Flume Neo4J sink and post it up on GitHub
> (or maybe in the Neo4J tree?). Donating it to the Apache Flume project once
> it's in decent shape may make sense at some point, especially if the
> dependencies are easy to share and integrate into the Flume project.
> However, I wouldn't say that it's a foregone conclusion that it really
> needs to be part of the Flume source tree. Assuming you need the sink, and
> are going to implement it anyway, then maybe we can defer the discussion of
> whether to include it in the Flume source tree until later. One of the
> things I try to keep in mind when integrating new plugin code is whether
> the project will be able to support the maintenance burden of the new code.
> 
> In reading from a graph db we need a mechanism to stream data from the
> > graph store into flume.
> >
> 
> Yes, I'd say it could potentially make sense to create a Flume Neo4J source
> as well. I think the same logic as above would still apply.
> 
> Regards,
> Mike
> 
> [1]
> https://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins
> [2]
> https://github.com/apache/incubator-kudu/tree/master/java/kudu-flume-sink

RE: [Discuss graph source/sink design proposal]

Reply via email to