I'm not entirely sure what "the PV (of some minute) from an apache's log in some 
server" means (page views, probably), but the answer to your question in general is 
that you'd use a filter to perform any aggregation that you need.

Heka exposes a circular buffer library in its Lua sandbox, specifically 
intended for handling time series data (see 
https://github.com/mozilla-services/lua_sandbox/blob/dev/docs/circular_buffer.md).
 To track by the minute, you'd initialize a cbuf with 60 seconds per row, 
adding values to the column in question (page views, in your case) as they come 
in, periodically emitting the circular buffer data which will show up as a 
graph on the dashboard or otherwise converted and processed as you see fit. The 
cbuf library also supports simple anomaly detection and alerting, if you want 
to do monitoring of the data.

Heka ships with a filter that uses the cbuf library to track HTTP status codes 
that have been parsed out of a web server's logs, see 
https://hekad.readthedocs.org/en/latest/config/filters/index.html#http-status-graph
 (or 
https://github.com/mozilla-services/heka/blob/dev/sandbox/lua/filters/http_status.lua
 for source code).

You don't have to use a circular buffer, of course; you can handle the 
aggregation yourself, and you can emit data in any format you desire, but then 
you lose the built in interoperability with the dashboard and the anomaly 
detection.

Hope this helps,

-r


 On 12/09/2014 09:21 AM, 储晓颖(章邯) wrote:
Hi All,
I am a software engineer. Recently I learnt about the brilliant HEKA
project. And I am wondering if She has solved the problems that I used
to deal with. The most important problem is the consistency in
Term-Data-Calculating situation. For example, if I want to calculate the
PV(of some minute) from an apache's log in some server, I have to flow
the log's content into HEKA and wait for its output. Assuming the minute
is M, when does HEKA know that the whole log of M has all arrived and
updated into the result? In my situation, I cannot show the PV of M
until it's calculated completely.  I used to depend on the data-driven
way —— if the first log of M+1 has arrived and the log transfer is in
sequence, I can release the data of M. And the solution is more
complicate considering merged PV of distributed apaches' logs in many
servers.
     Should I still concern about this problem if I use HEKA? And how
does she handle it?

       Thanks a lot.
            zhanghan


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka


_______________________________________________
Heka mailing list
[email protected]
https://mail.mozilla.org/listinfo/heka

Reply via email to