Sink, Source and Channel all extend LifecycleAware, so the function is available to all components already.

I was more questioning whether it was reasonable to start including logic to determine the state. That being said, I think the precedent of just returning the state set by start/stop is more one of habit, so on further thought I don't see it as being unreasonable. I'm going to give fixing ScribeSource with it a poke.

As to new lifecycle states, I took a pass at reworking the lifecycle model with guavas service implementation in the past, but it took some very significant changes and didn't get the momentum/interest necessary to keep working on it. That ticket is here https://issues.apache.org/jira/browse/FLUME-966 . Brock might be working on it now though the issue doesn't appear to have had attention since october.

On 01/17/2013 03:01 PM, Connor Woodson wrote:
Why limit it to the sources? If there is going to be a change to one
component's lifecycle, then I see no reason not to change every component's
lifecycle.

Sinks and Channels could very well have this problem; so what about giving
each LifecycleAware component a takePulse method (or something); or to
avoid creating a new method, add a new lifecycle state 'crashed' or such
which, when detected, causes a restart of the component. Then components
would just need to override getLifecycleState (if this method is polled
regularly; I don't know. maybe use a listener for when there's a state
change) to detect if it has crashed/needs to be restarted.

Just my thoughts,

- Connor


On Wed, Jan 16, 2013 at 9:08 PM, Juhani Connolly <
[email protected]> wrote:

Hmm, overriding the implementation of getLifecycleState provided by
AbstractSource could work. It would be going against the convention that
has been maintained in all other components(that I can think of)


On 01/17/2013 01:20 PM, Brock Noland wrote:

Hi,

Yes I can definitely see the issue. It sucks that we'd have to add yet
another thread. An alternative which wouldn't require another thread
would be to check the optional interface in the supervisor,
approximately here:

https://github.com/apache/**flume/blob/trunk/flume-ng-**
core/src/main/java/org/apache/**flume/lifecycle/**
LifecycleSupervisor.java#L240<https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/lifecycle/LifecycleSupervisor.java#L240>

However, I am not sold on the supervisor being the best place to fix
this as I am not sure that other lifecycle components would need this.



Brock

On Wed, Jan 16, 2013 at 7:45 PM, Juhani Connolly
<juhani_connolly@cyberagent.**co.jp <[email protected]>>
wrote:

I came upon an issue with ScribeSource,  though it's theoretically
applicable to any EventDrivenSource whose event generating thread(s) die.
Simple put, sending a bad packet to the thrift(scribe protocol) port will
result in it trying to allocate space for some arbitrarily large packet
resulting in an OOMException which kills the thread(incidentally I
thought
this would be an issue in avro too, but it throws an exception before
making
excessive allocation requests).

As far as flume is concerned, the component is still alive. stop() was
never
called, so even monitoring the component state using jmx will not notice
anything wrong. This situation occurs from user error, but there is
potential for other errors leaving a zombie component. I think it would
be
more user friendly to be able to recover from such errors.

I'm thinking of adding a StatusPollable interface that EventDrivenSources
can optionally implement(because we can't change the interface without a
version change). If implemented, the EventDrivenSourceRunner would
schedule
a regular poll to check the state. Upon failure it could either call
stop()
to signal it broke. With autoRestartPolicy, the source would then get
restarted by its supervisor.

Would appreciate any opinions before I put together a patch/post an
issue.




Reply via email to