EventDrivenSource and dead threads

Juhani Connolly Wed, 16 Jan 2013 19:46:08 -0800

I came upon an issue with ScribeSource, though it's theoreticallyapplicable to any EventDrivenSource whose event generating thread(s)die. Simple put, sending a bad packet to the thrift(scribe protocol)port will result in it trying to allocate space for some arbitrarilylarge packet resulting in an OOMException which kills thethread(incidentally I thought this would be an issue in avro too, but itthrows an exception before making excessive allocation requests).

As far as flume is concerned, the component is still alive. stop() wasnever called, so even monitoring the component state using jmx will notnotice anything wrong. This situation occurs from user error, but thereis potential for other errors leaving a zombie component. I think itwould be more user friendly to be able to recover from such errors.

I'm thinking of adding a StatusPollable interface thatEventDrivenSources can optionally implement(because we can't change theinterface without a version change). If implemented, theEventDrivenSourceRunner would schedule a regular poll to check thestate. Upon failure it could either call stop() to signal it broke. WithautoRestartPolicy, the source would then get restarted by its supervisor.


Would appreciate any opinions before I put together a patch/post an issue.

EventDrivenSource and dead threads

Reply via email to