Afraid so, although your wording is a bit imprecise. Heka doesn't "mark that section 
of the log file as processed," ever. When loading log files using the 
LogstreamerInput, Heka maintains a cursor for each log stream to keep track of how much 
of the stream has been loaded. As the data is loaded, it's parsed by whatever decoder(s) 
is/are registered, and then injects the messages containing the parsed data into the 
router.

The filters and outputs, including the ElasticSearchOutput, don't know anything 
about the original log files. They just receive messages from the router, they 
don't care where the data originally came from.

The ultimate result, though, is what you fear. If the ES output tries to send 
data to ES but fails, then the batch of data it's accumulated will be dropped. 
Issue #1103 (https://github.com/mozilla-services/heka/issues/1103) talks about 
our plan to resolve this through extended disk buffering. For this one case, 
though, an easier fix would be to have the ES output retry, and/or if it fails 
to have the data that *would* have been sent to ES to instead be written out to 
disk. Up for tackling that?

-r


On 01/09/2015 01:39 PM, Matt Hughes wrote:
The docs say:

|http_timeout (int):

     Time in milliseconds to wait for a response for each http post to ES. This 
may drop data as there is currently no retry. Default is 0 (no timeou
|

So…if Heka is trying to send lines 10–30 of a log file and the request
either fails with a non–200 or timeouts, what happens?

Does Heka really mark that section of the log file as processed and move
on?






_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to