I generally agree. My true preference would be to have the Flume (proper)
artifact not have any dependency on source, channel, or sink systems (e.g.
Hadoop, HBase, Cassandra, MySQL, JMS, etc). These components would be built
separately and installed into Flume. In other words, I would like Flume to
look more like Apache HTTPd with a few critical plugins that ship with it,
prebuilt, but most of which that exist outside of the primary artifact.
As for whether those "non-core" plugins live in the Flume repo in a contrib
module, I'm mixed. The advantage is that they receive greater visibility.
The cost is that they take on Flume's update cycle which can delay the
module author's ability to track the system with which they integrate (e.g.
Cassandra). Hadoop, ZooKeeper, Apache HTTPd, and a thousand other projects
have gone through this and it seems like there's never a good answer. I'd
be in favor of having a contrib module for now, as a home for these plugins
while Flume is growing. My guess is that it will become unruly, after some
time, though.
Borrowing from Mubarak's suggestion, maybe we can do something like:
...
contrib/
sources/
channels/
sinks/
cassandra/
voldermort/
tools/
examples/
In other words, mirror the top level directory under contrib.
I will insist that absolutely no dependency from Flume be made on a contrib
component. If there is, it should be included in Flume proper. I think we
already have an overly complex dependency graph between modules.
On Sat, Jun 16, 2012 at 6:32 PM, Hari Shreedharan <[email protected]
> wrote:
> Hi Flume devs,
>
> I would like to propose moving several pieces of code out to a different
> contrib module which are pieces of code not essential to flume
> functionality but pluggable into some Flume framework. Examples of this are
> the various kinds of interceptors, event serializers etc. I don't think we
> should keep them in the code tree as the interceptor framework or the
> HBase/HDFS sinks - as this adds several files into the main module. When
> recently a new serializer was being contributed for the HbaseSink, I really
> wanted it checked in, and I checked it into the main module - since I
> didn't know where else it should go. It didn't make much sense to create a
> new package either.
>
> It is really great that people are implementing features and contributing
> it back to Flume. This is an example of Flume maturing as a project. As a
> result, we need to define a place where such contributions go. We could
> keep them in the same package to prevent bc issues, but move them out to
> another module.
>
> What do you guys think?
>
>
> Thanks!
> Hari
>
--
Eric Sammer
twitter: esammer
data: www.cloudera.com