Hi, Yeah I'd prefer to avoid overriding getLifeCycle to do this since getters are typically lightweight. As such, I feel it violates the general contract behind getters.
In regards to channels/sinks/pollable sources, I don't think they need this behavior as sinks and poll-able sources both have threads polling them at regular intervals. We have two robust channels memory and file which don't require this behavior as well. Given this, I feel like this is limited to EventDrivenSources which don't have a thread polling them. On second thought, FLUME-1630 got rid of one layer of threads in the flume agent, perhaps adding a single threaded scheduled executor to EventDrivenSource is not a bad solution. Brock On Wed, Jan 16, 2013 at 11:02 PM, Connor Woodson <[email protected]> wrote: > What I was trying to say is that the ScribeSource should not be responsible > for restarting itself (just from my understanding of your idea; it breaks > the existing paradigm as components should not have to control their own > lifecycle). Going from Brock's link, I feel the most dynamic solution would > be possible to just add in a lifecyle state RESTART and place it in that > switch statement; when that state is reached, it tries to stop then restart > the component and then sets the desired state to START (or STOP if it > couldn't start it again, or set error=true if it couldn't stop it). > > And in a way to prevent overriding getLifecycleState to return RESTART, > there could either be an inherited function from AbstractSource to call for > a restart, there could maybe be an option to set it to RESTART on crash, or > something else. > > I'll admit though that I know little about the lifecycle system, so I have > no idea if this idea is any better. > > - Connor > > > On Wed, Jan 16, 2013 at 10:21 PM, Juhani Connolly < > [email protected]> wrote: > >> Sink, Source and Channel all extend LifecycleAware, so the function is >> available to all components already. >> >> I was more questioning whether it was reasonable to start including logic >> to determine the state. That being said, I think the precedent of just >> returning the state set by start/stop is more one of habit, so on further >> thought I don't see it as being unreasonable. I'm going to give fixing >> ScribeSource with it a poke. >> >> As to new lifecycle states, I took a pass at reworking the lifecycle model >> with guavas service implementation in the past, but it took some very >> significant changes and didn't get the momentum/interest necessary to keep >> working on it. That ticket is here https://issues.apache.org/** >> jira/browse/FLUME-966 <https://issues.apache.org/jira/browse/FLUME-966> . >> Brock might be working on it now though the issue doesn't appear to have >> had attention since october. >> >> >> On 01/17/2013 03:01 PM, Connor Woodson wrote: >> >>> Why limit it to the sources? If there is going to be a change to one >>> component's lifecycle, then I see no reason not to change every >>> component's >>> lifecycle. >>> >>> Sinks and Channels could very well have this problem; so what about giving >>> each LifecycleAware component a takePulse method (or something); or to >>> avoid creating a new method, add a new lifecycle state 'crashed' or such >>> which, when detected, causes a restart of the component. Then components >>> would just need to override getLifecycleState (if this method is polled >>> regularly; I don't know. maybe use a listener for when there's a state >>> change) to detect if it has crashed/needs to be restarted. >>> >>> Just my thoughts, >>> >>> - Connor >>> >>> >>> On Wed, Jan 16, 2013 at 9:08 PM, Juhani Connolly < >>> [email protected].**jp <[email protected]>> >>> wrote: >>> >>> Hmm, overriding the implementation of getLifecycleState provided by >>>> AbstractSource could work. It would be going against the convention that >>>> has been maintained in all other components(that I can think of) >>>> >>>> >>>> On 01/17/2013 01:20 PM, Brock Noland wrote: >>>> >>>> Hi, >>>>> >>>>> Yes I can definitely see the issue. It sucks that we'd have to add yet >>>>> another thread. An alternative which wouldn't require another thread >>>>> would be to check the optional interface in the supervisor, >>>>> approximately here: >>>>> >>>>> https://github.com/apache/****flume/blob/trunk/flume-ng-**<https://github.com/apache/**flume/blob/trunk/flume-ng-**> >>>>> core/src/main/java/org/apache/****flume/lifecycle/** >>>>> LifecycleSupervisor.java#L240<**https://github.com/apache/** >>>>> flume/blob/trunk/flume-ng-**core/src/main/java/org/apache/** >>>>> flume/lifecycle/**LifecycleSupervisor.java#L240<https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java#L240> >>>>> > >>>>> >>>>> >>>>> However, I am not sold on the supervisor being the best place to fix >>>>> this as I am not sure that other lifecycle components would need this. >>>>> >>>>> >>>>> >>>>> Brock >>>>> >>>>> On Wed, Jan 16, 2013 at 7:45 PM, Juhani Connolly >>>>> <juhani_connolly@cyberagent.****co.jp <http://co.jp> < >>>>> juhani_connolly@cyberagent.**co.jp <[email protected]>>> >>>>> >>>>> wrote: >>>>> >>>>> I came upon an issue with ScribeSource, though it's theoretically >>>>>> applicable to any EventDrivenSource whose event generating thread(s) >>>>>> die. >>>>>> Simple put, sending a bad packet to the thrift(scribe protocol) port >>>>>> will >>>>>> result in it trying to allocate space for some arbitrarily large packet >>>>>> resulting in an OOMException which kills the thread(incidentally I >>>>>> thought >>>>>> this would be an issue in avro too, but it throws an exception before >>>>>> making >>>>>> excessive allocation requests). >>>>>> >>>>>> As far as flume is concerned, the component is still alive. stop() was >>>>>> never >>>>>> called, so even monitoring the component state using jmx will not >>>>>> notice >>>>>> anything wrong. This situation occurs from user error, but there is >>>>>> potential for other errors leaving a zombie component. I think it would >>>>>> be >>>>>> more user friendly to be able to recover from such errors. >>>>>> >>>>>> I'm thinking of adding a StatusPollable interface that >>>>>> EventDrivenSources >>>>>> can optionally implement(because we can't change the interface without >>>>>> a >>>>>> version change). If implemented, the EventDrivenSourceRunner would >>>>>> schedule >>>>>> a regular poll to check the state. Upon failure it could either call >>>>>> stop() >>>>>> to signal it broke. With autoRestartPolicy, the source would then get >>>>>> restarted by its supervisor. >>>>>> >>>>>> Would appreciate any opinions before I put together a patch/post an >>>>>> issue. >>>>>> >>>>>> >>>>> >>>>> >> -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
