Sorry, I missed a couple things at the end (inline) On Mar 4, 2012, at 11:29 PM, Mike Percy wrote:
> On Mar 4, 2012, at 9:52 PM, Juhani Connolly wrote: > >> In the "poor code reviews" discussion, Mike Percy suggested opening up a >> thread regarding the roadmap for 1.1.0 and beyond, so here's a go at kicking >> that off. >> >> I think a the following questions present themselves, along with my opinions: >> >> - When do we hope to make the next solid release? Do we have a planned >> schedule(that I may be unaware of?) >> Personally I am not too attached to deciding a date in advance and would >> prefer to decide a fixed set of issues that we prioritize to fix, then limit >> the branch to bug fixes only(moving any further dev to a separate branch), >> and push that out as the next release when sufficient testing has been made >> with harmful bugs removed. > > I'd be inclined to try to release as often as we think we have useful > features and bug fixes implemented, to maintain a rhythm and keep the > vitality of the project high. I think releasing often also helps encourage > users to engage with the developer community and try out and vet experimental > features. > >> - What belongs in 1.1.0? >> I for one think that for any log delivery infrastructure the core parts for >> delivery mechanisms and error recovery mechanisms should be of primary >> importance, and this is what I've been trying to work on. I do not feel that >> any further sources or sinks are necessary, but feel that for delivery >> mechanisms, the lack of a FileChannel is pretty painful. I also feel that a >> buffering mechanism(as in scribed), allowing to store channel overflow in a >> long-term medium should be a priority. > > I tend agree with what you're saying, although I don't really have an > aversion to integrating more Sinks as long as they have maintainers. I agree > that a long term buffering solution is very important, I think that would be > part of FileChannel though. Overall I think we should strive for correctness > in the core, medium term API stability, and system speed, in that order for > the next release. The primary thing I am looking at right now is the RPC > mechanism, to ensure we are set up to take full advantage of Avro RPC > performance features and ensure that remote clients can integrate with Flume > in the future. I have some concerns there and I'll start a thread about it > tomorrow probably, since if there are reasons to break wire compatibility we > should do it as early as possible in the life of 1.x. (incidentally I also > think we should start calling it 1.x instead of NG to avoid coining terms > like Flume ONG and Flume NNG for 2.x :) > > Along the vein of system interfaces, one big thing that I think is missing in > Flume is Javadoc of all the core interfaces and classes. This is something I > am certainly willing to work on. Mainly I believe that the various interface > contracts need to be strongly specified in the base class Javadoc so that > it's easier to tell if something is wrong and to ensure consistency across > implementations. For example, if there is an error delivering an event should > a Sink return BACKOFF or throw an EventDeliveryException? I'm not sure why > one is a return value and the other is an exception, but we should make sure > consequences and best practices are documented, and any Sinks in the core > should be consistent. I'm still getting my head around the system and using > the source (, Luke) to figure these things out. But hopefully future devs and > API users won't have to do that as much. > > One more thing that I think is important, while not really related to a > software release per se, is coming up with stories around how common use > cases are supposed to work or eventually be possible. Something I've been > thinking about a lot is Apache web server log collection onto HDFS. While > tail source is known to be problematic (deserves a FAQ entry), we should > provide explanations and best practices for the most common cases. (In this > case I think it involves writing an apache httpd mod_flume module that speaks > Avro). We can then eventually provide code for these most common cases when > we have time to implement them or as they are contributed. These very common > use cases and the stories around them should inform our design decisions. > >> I am unsure of configuration overhauls. We have one configuration method >> that works. Should a centralized one be an immediate target or one for >> 1.1.0. Should refactoring the configuration be a priority(it was pointed >> out that FlumeConfiguration has become a god class)? > > OK so my understanding is that some changes to how we do config validation > are required to be able to write a tool to validate Flume configs without > having to start an agent. The idea is for this functionality to be separated > from the core to some extent so that the validation mechanism can be exposed > as an API. The initial request for an API came from the Cloudera enterprise > team, who wants to add Flume configuration validation support in the Cloudera > Manager app. Personally I think it would be a great feature to have in a > command line tool as well. From an operations perspective, it's nice to have > the ability to check that your config is valid before pushing it, instead of > finding out your config is broken once you deploy to all your agents… > especially if you are in an emergency production situation and you need to > make changes fast. If you have concerns about the implementation beyond the > issues that Eric raised, or even if you agree/disagree with the current > feedback on the review, then I know Hari would appreciate any constructive > feedback that you or other folks can provide. Of course if folks think that > it's an undesirable feature, have concerns, or think there is a better way to > design it then they should definitely speak up in the JIRA, the review tool, > or here as well. > > Anyway, I think other folks should chime in on this thread and we should > ultimately morph this discussion into a list of JIRAs for inclusion into a > 1.1.0. And I would advocate that the rest would move to 1.2.0 by default. > >> There are a few other leftovers from flume-728: metric collection >> infrastructure, documentation, master. Should these be targets for 1.1.0 or >> for further down the road? >> We should probably also make clear which components need to be thread safe >> and which don't. We should also verify this is the case. What do you mean by Master? +1 on documenting thread safety and providing much more documentation in general. I'm not sure about exposing metrics for 1.1.0… while it's important for folks running Flume and we should make it a high priority, I think we could probably provide enough value with more important stuff to justify a next release without it, if we are releasing frequently. Then again if someone wanted to work on JMX support or something like that I wouldn't be against it! Regards, Mike
