Hmm, maybe a different Kudu project? Not sure. Anyway, this type of "changelog" thing would require support in the DB for streaming its write-ahead log or something. For example, we don't support that in Apache Kudu (incubating) -- maybe someday.
Regarding Flume, I usually think it's useful to distinguish between a source and a sink. They are typically written as separate classes and they represent different interfaces at the Flume Java API level. So, how would one write a streaming database source? That really depends on the database and the APIs it provides for that. Mike On Tue, Jun 28, 2016 at 8:30 AM, Saikat Kanjilal <[email protected]> wrote: > :) I'm using Kudu at work at the moment to troubleshoot some Tomcat > issues, regarding the where to keep the source code I would say for now > lets go with the plugin approach and revisit the "where does the code live" > conversation later. One thing I do want to discuss is that the plugin will > act as a source or a sink depending on configuration, so if the plugin acts > as a source we need a mechanism (like a daemon in syslog) to stream changes > real time from a graphdb into flume, I was wondering if there are any past > approaches around this that I can follow, I may need to dig into the neo4j > kernel to see where we can inject something like this. > Thoughts on that? > > > From: [email protected] > > Date: Tue, 28 Jun 2016 00:27:45 -0700 > > Subject: Re: [Discuss graph source/sink design proposal] > > To: [email protected] > > > > Hi Saikat, > > Please see my thoughts inline. This is how I think about this stuff; > others > > may think about it differently. > > > > On Mon, Jun 27, 2016 at 8:45 PM, Saikat Kanjilal <[email protected]> > > wrote: > > > > > Exactly right, I'm proposing we create a graph sink for flume while > > > keeping the flume core intact. > > > > > > As you are probably aware, sources and sinks don't have to be part of the > > main Apache Flume source tree to be used with Flume. The plugins.d > > mechanism described in [1] makes building and integrating separate > plugins > > into Flume an easy thing to do at deployment time. > > > > In another project I work on, Apache Kudu (incubating), we have a Flume > > Kudu sink committed in the main source tree [2]. We may at some point > > propose to move it into the Flume source tree, but for now (for testing > and > > API stability reasons) it's easier to keep it in the Kudu source tree. > > > > Likewise, you could implement a Flume Neo4J sink and post it up on GitHub > > (or maybe in the Neo4J tree?). Donating it to the Apache Flume project > once > > it's in decent shape may make sense at some point, especially if the > > dependencies are easy to share and integrate into the Flume project. > > However, I wouldn't say that it's a foregone conclusion that it really > > needs to be part of the Flume source tree. Assuming you need the sink, > and > > are going to implement it anyway, then maybe we can defer the discussion > of > > whether to include it in the Flume source tree until later. One of the > > things I try to keep in mind when integrating new plugin code is whether > > the project will be able to support the maintenance burden of the new > code. > > > > In reading from a graph db we need a mechanism to stream data from the > > > graph store into flume. > > > > > > > Yes, I'd say it could potentially make sense to create a Flume Neo4J > source > > as well. I think the same logic as above would still apply. > > > > Regards, > > Mike > > > > [1] > > > https://flume.apache.org/FlumeUserGuide.html#installing-third-party-plugins > > [2] > > > https://github.com/apache/incubator-kudu/tree/master/java/kudu-flume-sink > >
