On 11/25/2013 12:14 AM, Juhani Connolly wrote:
Hey guys,

What I write here is all just my personal opinion and I'm writing in
hopes of starting a discussion and/or getting feedback. I know I've not
been very active on the project recently(due to other engagements) but
do still want it to succeed and hope to find more time for it eventually.

Right now I see new/active issues for the addition of Redis and Kafka
sinks, and while they're nice features, I'm personally concerned about
feature bloat of the project. There are dozens of interceptors, sinks
and sources that can be thought of, but most of them are very specific
to a specific use-case.

Every time we add a new component we're also committing to maintaining
it over future releases, even if the original contributor gets too busy
for it. The more such components get added, the more we will get
distracted from improving core features and getting rid of issues
affecting them.

For these reasons I generally haven't submitted components we developed
for internal use(because they are too specific to our use cases), just
passing back fixes that fix bugs or apply to the core project.

For these reasons I think we may want to consider either a) being more
selective regarding additional component submissions or b) make a
contrib directory to the project which includes the components but
doesn't guarrantee ongoing maintenance or compatibility.

On the flip side of course, taking approaches like this may discourage
new contributors and could thus be considered a negative, and if many
people feel this way they should definitely share their thoughts.

I'd be curious to know what others think, and what direction they hope
to take the project in the future.


Hi,

I should probably chime in since I submitted the patch for the Redis sink.

I see the arguments about keeping Apache Flume lean, but I am not sure their benefits outweigh their costs.

As a user, having Apache Flume able to speak multiple sources and sinks is a big plus. Having to shop around for various sources/sinks is more troublesome since I have to first find which flavor of a given sink is being maintained today, deal with licenses, incompatibilities, mismatch versions, upgrades, deployment, not fixed bugs and wondering if this is even going to work at all. Knowing a piece of code is in Apache Flume puts my mind at ease since the license is clear, CLA cleared and it has been reviewed. There may be some expectations regarding its support and quality, but it should be fine as long as it is clearly stated and labeled (See the contrib idea, or tagging them with different labels such as "supported", "experimental"). This also gives more opportunities for bugs to be fixed and therefore having code better maintained, due to the wider audience of Apache Flume in comparison to a random small project on github. Also as a user, I would have to be fairly technical to use a random source/sink outside of Apache Flume. I would probably have to build it, qualify it against my version of Apache Flume, and package it for deployment. Whereas if it is in Apache Flume, it's either already in the tarball or already in the package of my favorite Apache Flume distribution.


As a developer, Apache Flume is very flexible since I can pick and choose most parts. But if I have to write my own source and/or my own sink, I may be tempted to forego Apache Flume altogether and write the rest myself for my specific use case. But if I get to write a source for my use case, I don't have much incentive to make it public or to maintain it with the current Apache Flume version. I just need to ensure it works for my version of Apache Flume. Everything else is just extra work. Also in the context of a company, I would rather target my source/sink to work with one of vendor supported version of Apache Flume, which may be different from the latest Apache Flume. I would have no incentive to go through the effort of testing it against Apache Flume. If my source/sink was in Apache Flume, I would be more interested in contributing to Apache Flume since I know the changes would trickle down at some point and make my life easier.

As an Apache Bigtop contributor, having all these projects spread around scares me. They will all depend against different versions of Apache Flume, build in different ways, works in different ways and integrate in their own way. Sending patches upstream will also be troublesome since now we would have to talk to and work with a lot more people than just Apache Flume folks. Each of these people having different schedules and ways of working.


In conclusion, I believe having a diverse set of Source/Sink/Channel may not be a bad idea. If such piece is not maintained and no-one is willing to maintain it, then I don't see why it could not be removed.

In order to prevent a source/sink/channel to rot, besides creating a contrib area, we could also do the following
* Tag the component based on their known quality and stability
* Be strict about unit tests
* Maybe require some integration tests also.


Thanks,
Bruno

Reply via email to