Rob,

With respect to why the queue file has been filling up so quickly, this has 
been due to my configuration with the LogstreamerInput responsible for 
harvesting Apache log entries.  This input was processing a total of 38 Apache 
logs and when starting up Heka, there's no doubt it was experiencing an initial 
surge of log entries leading to full queue files.  Once I configured the Input 
to use just a single Apache log, Heka running in more of a steady state 
operation performs fine.  Moreover, I've been able to confirm the queue files 
are, indeed, being automatically deleted.  In fact, I ran this configuration 
continuously for several hours with no issue.

FWIW. Based on an earlier comment you made about reproducing an issue with the 
queue_full_action=block setting, I've done all my testing with the setting: 
queue_full_action=drop.  Will look forward to an update that addresses the 
'block' issue not properly recovering.

Again, greatly appreciate your support!

Thanks.

Chris

Thanks so much for this explanation.

On Apr 15, 2015, at 6:08 PM, Rob Miller wrote:


I'll explain how things are supposed to work, hopefully that'll help provide 
some context.

When buffering is in play, whether in a TcpOutput or an ElasticSearchOutput, 
*every* message will go through the buffer. Messages are received via the 
plugin's input channel and then (when things are flowing smoothly) immediately 
written to disk at the end of the buffer. There is another goroutine running 
that is constantly pulling records from earlier in the buffer, where the cursor 
points, and trying to send them. If a send is successful, the queue cursor is 
updated to the next record, and the process is repeated. If the send fails, the 
queue cursor doesn't update, and the same record will be retried until the send 
succeeds.

When the sending goroutine clears out one queue file and moves on to the next 
one, it is supposed to advance the cursor to the next file and *delete* the 
file that was just finished. You didn't mention explicitly whether the queue 
files are being automatically deleted as they're drained. Are they?

While this is happening, Heka is keeping track of the size of the disk queue. 
When a message is added to the queue, the size increases. When a file is 
drained and deleted, the size goes down. This is all fine unless and until the 
size of the queue hits the specified max size, then the behavior is specified 
by `queue_full_action`. The "shutdown" option is self-explanatory, Heka shuts 
down. The "drop" option means that the intake goroutine just drops the message 
on the floor. Messages keep flowing, but they don't get added to the queue, 
they never will. The "block" option means that the plugin stops pulling from 
the input channel altogether. The channel backs up, eventually blocking the 
router, traffic stops flowing through Heka until there's room for the queue to 
grow again.

In both the "drop" and "block" case, correct recovery depends on Heka being 
able to continue processing the buffer. As records get processed and queue 
files are drained, they're deleted. This will push the queue size below the 
maximum size, which then in turn means the intake goroutine can once again 
start appending to the end of the queue.

If you delete queue files out from under the output, things will get weird. The 
output goroutine will probably get confused, because the file handle it's 
holding no longer points to a valid file. Also, Heka won't know to subtract 
that file size from the queue size, so until you do a restart (which causes 
Heka to scan through the queue and recalculate its total size) the queue will 
always seem bigger than it actually is.

One thing that comes up for me is it's weird how your queue is filling up so 
quickly. Why is that happening? The buffer is meant to allow Heka to survive 
small amounts of downtime or disconnect without losing any data, or to handle 
short burst spikes. It's not magic; if the data is continually coming in more 
quickly than it can go out, then you're going to have a problem, no matter 
what, disk queuing is only going to delay the inevitable.

Does this help at all?

-r


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to