Re: [heka] Deadlock due to busy filter creating pipelinepacks?

Rob Miller Sat, 26 Jul 2014 12:24:40 -0700

Hi!

What you've hit here isn't a bug, but just one of the facts of life inworking w/ Heka and how it's currently designed. There's a static poolof packs, and if those packs are exhausted then you're going todeadlock b/c everybody will be waiting for new packs and none are beingfreed. Similarly, if the traffic flowing through the router backs up,then things will come to a stop b/c, well, just about everything flowsthrough the router.

The good news is that we have yet to find a case where this is actuallya show-stopper. Once you know that these are the pitfalls, you canwrite your plugin code and structure your pipeline in a way that keepsdata flowing.

The first thing to point out is that the size of the PipelinePack poolis configurable, seehttp://hekad.readthedocs.org/en/v0.6.0/config/index.html#global-configuration-options.So if you really have a case where one plugin is going to need toconsume a large number of packs at a single time, then you can crankthe number of available packs way up. This will increase the residentmemory size of the running hekad process, but it will give you theheadroom you need to not cause everything else to block due to runningout of packs.

Really, though, while there are cases where you need a lot of packs,those are pretty rare. Usually you can structure things so that youfree packs up as you use them. That is, instead of having a filter askfor dozens of packs, populate them, and then start injecting them intothe router, you can ask for one, populate it, inject it, and then askfor the next one. I don't know enough about your use case to know ifsuch adjustments would help, though. To that end I *would* beinterested in you coding up an example filter that causes the issue, Imight be able to suggest changes that would alleviate theback-pressure. Then you can let us know if you have other constraintsthat prevent you from making those changes in your production code.

Another thing that I find curious is that apparently your aggregationstep is taking a long time, so long that everything backs up while it'shappening. This seems like an unusual situation. Usually aggregationhappens as the data flows, and the periodic flushes don't actually haveto do much other than inject the data that's been accumulating. Again,I don't know enough about your particular use case to say whether thisis easily changed, but ideally the aggregation step wouldn't be such aheavy "stop-the-world" kind of activity.

Assuming that can't be improved, your current solution of doing theheavy work in another goroutine so the receiving goroutine doesn't haveto block is a fine one. In fact, I've used that often enough that Iactually call it the "batch-and-back" pattern. I usually pre-allocatetwo buffers and pass them back and forth on channels btn thegoroutines. The receiving goroutine populates the first buffer and,when some batch threshold (size and/or time elapsed) is reached, itpasses it to the committing goroutine. The committing goroutine grabsthis, does its work, and then drops the (now "empty") buffer on thereturn channel for reuse by the receiving goroutine. You can see thisin action in the FileOutput:https://github.com/mozilla-services/heka/blob/dev/plugins/file/file_output.go.

Also, yes it is possible to bypass the router entirely by deliveringpacks directly to a specific filter or output, assuming you know thename of the registered plugin. The functions to support this areavailable on the PluginHelper interface, passed in to each filter's Runmethod:

https://github.com/mozilla-services/heka/blob/dev/pipeline/config.go#L46

If you do go this route (no pun intended), I recommend you have theoutput name specified as part of your filter's config and nothard-coded.

Finally, when you're working in Go, anything is possible. It's probablynot necessary, but there's nothing stopping you from instantiating yourown set of packs entirely outside of the pools that Heka is alreadycreating. When you call NewPipelinePack, you pass in the channel thatthe pack will be returned to when Recycle is called, so you could eveninject them into the router or pass them on to some other plugin andultimately they'll be returned to you when they've been processed.


Hope this helps,

-r



On Sat 26 Jul 2014 01:51:24 AM PDT, Nimi Wariboko Jr wrote:

Hi,

We've come across an issue stress testing one of our setups. We have a
3 step setup, one (tcp) input, (aggregate) filter, and (cassandra)
output. The way it works is on the input is a stream of key-values,
which are then aggregated on key, and periodically flushed to the
cassandra output.

What we began to see under a stress test was in the periodic aggregate
step, the aggregate filter will begin to flush its memory by
requesting and injecting pipeline packs (up to ~13k pipelinepacks,
multiple times larger than the 100 pool size). Surprisingly,
eventually the pipeline pack pool is exhausted and heka will freeze.

After digging around, it seems the issue is:
1.) A large amount of pipeline packs are sent to the aggregate filter.
2.) The aggregate filter begins its aggregate step, and stops
accepting packs from its InChan.
3.) The aggregate filter also begins to request PipelinePacks and
inject them into the stream.
4.) Because the aggregate filter is no longer accepting requests, the
MessageRouter is stuck blocking for the aggregate filter to accept a
message.
5.) Because the MessageRouter isn't routing messages, the injected
packs aren't going to the Cassandra output, and aren't ultimately
being freed
6.) Because the packs aren't being freed, the pool is eventually exhausted
7.) Heka freezes even though everybody calls Recycle or Inject.

If this doesn't make sense, I can code up an example filter/input
plugin that should cause the deadlock.

I'm sure if this is a legitimate bug, or just something we shouldn't
do. Currently we have sidestepped the issue by copying the data and
doing the flush in a separate goroutine (freeing up the filter to
continue to accept packs). Another solution for us would be to
completely sidestep the router and just pass the pipeline pack to the
output directly. The docs claim this is possible, but I'm not entirely
clear on how to achieve this (how do you get the reference to the
target plugin InChan?).

All in all, it seems that heka can freeze if the message router tries
to deliver a pipelinepack to a filter that is in a busy loop
requesting & injecting pipeline packs.

Thanks,
Nimi Wariboko Jr.
[email protected] <mailto:[email protected]>


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Re: [heka] Deadlock due to busy filter creating pipelinepacks?

Reply via email to