In a sandbox filter you typically implement two functions. The first, `process_message`, is called for every incoming message, and it's where you perform the aggregation. The second, `timer_event`, is called every ticker interval, and it's where you emit the data that has been aggregated since the last timer_event call.
If you don't want to include data for the most recent interval b/c that data is still incomplete, there's nothing stopping you from doing so, it'd likely just be some extra logic (and maybe some copying) in your timer_event function. -r On 12/10/2014 11:38 PM, 储晓颖(章邯) wrote:
Thanks a lot for reply. I think my problem happens while "periodically emitting the circular buffer data which will show up as a graph" : Here is a slide from http://slides.seld.be/?file=2013-12-13+Application+monitoring+with+Heka+and+statsd.html I notice that the tail of the graph is failing. I guess it's the same problem as mine: I don't want to emit the real-time data until it's totally correct. And the difficulty is when does HEKA know a "60 second aggregation" has completed totally? I think the key of solution is Periodically-Data-Collecting. We must collect the data in the very source periodically, wtih executing aggregation task periodically. Then we can emit the correct data when the whole task is fiinished (like a real-time hadoop MAP/REDUCE job, but the source is not HDFS). If the data-flow is like a stream (using storm, for expamle), we cannot acheive the target easily. ------------------------------------------------------------------ 发件人:Rob Miller <[email protected]> 发送时间:2014年12月11日(星期四) 01:41 收件人:储晓颖(章邯) <[email protected]> 抄 送:heka <[email protected]> 主 题:Re: [heka] Some question about HEKA I'm not entirely sure what "the PV (of some minute) from an apache's log in some server" means (page views, probably), but the answer to your question in general is that you'd use a filter to perform any aggregation that you need. Heka exposes a circular buffer library in its Lua sandbox, specifically intended for handling time series data (see https://github.com/mozilla-services/lua_sandbox/blob/dev/docs/circular_buffer.md). To track by the minute, you'd initialize a cbuf with 60 seconds per row, adding values to the column in question (page views, in your case) as they come in, periodically emitting the circular buffer data which will show up as a graph on the dashboard or otherwise converted and processed as you see fit. The cbuf library also supports simple anomaly detection and alerting, if you want to do monitoring of the data. Heka ships with a filter that uses the cbuf library to track HTTP status codes that have been parsed out of a web server's logs, see https://hekad.readthedocs.org/en/latest/config/filters/index.html#http-status-graph (or https://github.com/mozilla-services/heka/blob/dev/sandbox/lua/filters/http_status.lua for source code). You don't have to use a circular buffer, of course; you can handle the aggregation yourself, and you can emit data in any format you desire, but then you lose the built in interoperability with the dashboard and the anomaly detection. Hope this helps, -r On 12/09/2014 09:21 AM, 储晓颖(章邯) wrote: > Hi All, > I am a software engineer. Recently I learnt about the brilliant HEKA > project. And I am wondering if She has solved the problems that I used > to deal with. The most important problem is the consistency in > Term-Data-Calculating situation. For example, if I want to calculate the > PV(of some minute) from an apache's log in some server, I have to flow > the log's content into HEKA and wait for its output. Assuming the minute > is M, when does HEKA know that the whole log of M has all arrived and > updated into the result? In my situation, I cannot show the PV of M > until it's calculated completely. I used to depend on the data-driven > way —— if the first log of M+1 has arrived and the log transfer is in > sequence, I can release the data of M. And the solution is more > complicate considering merged PV of distributed apaches' logs in many > servers. > Should I still concern about this problem if I use HEKA? And how > does she handle it? > > Thanks a lot. > zhanghan > > > _______________________________________________ > Heka mailing list > [email protected] > https://mail.mozilla.org/listinfo/heka >
_______________________________________________ Heka mailing list [email protected] https://mail.mozilla.org/listinfo/heka

