Hi Edward, In my current project we are using Prometheus for system/service monitoring for 200+ servers and we have issues when it comes to scalability.
What about Apache drill? Warm Regards Sidharth Kumar | Mob: +91 8197 555 599 On Mon, 18 Feb, 2019, 3:18 PM Zhao, Qingwen <[email protected] wrote: > Got it. I agree with your idea. > I have used Prometheus for a while in another project, and it's very easy > to use and maintain it. > > Thanks, > Qingwen > > On 2018/12/24, 1:54 PM, "Edward Zhang" <[email protected]> wrote: > > Qingwen, > > There is no new architecture yet, it is just a very initial discussion > :-) > > HBase is mainly used for job performance monitoring where mapreduce > job/task data are stored. As long as Eagle supports customized job > performance monitoring, the storage has be there although our data > model is > pretty agnostic to storage implementation. > > For metrics, Eagle 0.5 actually does not store metrics and only process > them in streaming mode. My suggestion is we use mature tools like > Prometheus to store and visualize metrics while Eagle focuses on policy > evaluation. > > Thanks > Edward > > On Thu, Oct 25, 2018 at 3:00 AM Zhao Qingwen <[email protected]> > wrote: > > > Hi Edward, > > > > In the new architecture, is the storage(hbase) taken off? > > How do the adaptors store the data? for example, hadoop namenode > metrics. > > > > Best Regards, > > Qingwen Zhao | 赵晴雯 > > > > > > > > > > > > Edward Zhang <[email protected]> 于2018年10月12日周五 上午10:49写道: > > > > > Hi Eaglers, > > > > > > I would like to start some discussion about architecture > improvement for > > > Apache Eagle based on community experience and feedback. The > improvement > > is > > > targeted for simplifying installation and development of Apache > Eagle. > > > > > > Eagle's main responsibility is to report abnormality instantly by > > applying > > > policies on streaming data. Eagle consists of two major components, > > Policy > > > Engine and Adaptors. Policy Engine is a standalone application > which > > > provides REST API to manage policy lifecycle for different data > sources > > and > > > provides runtime to evaluate policy on streaming data. Adaptors > are > > those > > > applications which fetch/process log/metrics from outside and send > data > > to > > > policy engine for alerting purpose. > > > > > > But right now in Eagle code base, it is not clearly focusing on > the two > > > components. For example the current source code includes map/reduce > > > job/task log retrieval/cleanup/analysis which is very useful but > probably > > > Eagle only needs the portion of data retrieval/cleanup part and so > data > > can > > > be streamed into policy engine for alerting purpose. For job/task > > analysis > > > part, it can be maintained in other project. > > > > > > First let me list the main modules Eagle source code consists of. > > > - eagle core > > > - policy engine (coordinator, runtime, and web) > > > - monitor application management > > > - eagle query framework - for querying time series data from > hbase > > > - eagle adaptors > > > - gc log fetch/processing and alerting > > > - metric fetch/processing and alerting, including name node, > data > > > node, hbase etc. > > > - jpm: job performance management. > > > - haoop yarn queue statistics fetch/processing > > > - hadoop mapreduce history job log processing > > > - hadoop mapreduce running job processing > > > - spark history job log processing > > > - spark running job processing > > > - jpm web application > > > - hadoop job analyzer > > > - security monitoring > > > - hdfs audit log fetch/processing > > > - hdfs auth log fetch/processing > > > - hbase audit log fetch/processing > > > - hive log fetch/processing > > > - maprfs audit log fetch/processing > > > - oozie audit log fetch/processing > > > - hadoop topology stats fetch/processing > > > - eagle server > > > > > > It is very obvious that it is not scale for Eagle community to > maintain > > so > > > large amount of monitoring adaptors especially when Hadoop/Spark > versions > > > are evolving pretty fast. > > > > > > My suggestion is Eagle ONLY focus on policy engine and some default > > > important adaptors but remove/separate some unrelated > functionalities. > > For > > > policy engine, it would be nice if it can run on popular streaming > engine > > > besides Apache Storm so that it can be easily deployed for > community > > users. > > > For default important adaptors, I may suggest Eagle have ONLY HDFS > audit > > > log, Hadoop running job, Spark running job, HDFS namenode metrics > etc. > > For > > > unrelated functionalities, we can either remove them from Eagle > code base > > > or separate them into standalone executable if that is still really > > needed > > > under Apache Eagle monitoring umbrella by community. > > > > > > So the proposed Eagle code base would be like: > > > - policy engine > > > - coordinator > > > - runtime > > > - web > > > - adaptors > > > - hdfs audit log > > > - Hadoop running job > > > - Spark running job > > > - HDFS namenode metrics > > > - Hadoop yarn queue metrics > > > - extensions (some non default adaptors contributed by community) > > > - executables (standalone executables which are legacy) > > > > > > It would be great if you can provide more feedback on this > discussion. > > > > > > (By the way, I also had a lot of discussion with Hao, Chen, Eagle > PMC > > > member and core developer about this topic based on his experience > of > > > engaging Eagle users.) > > > > > > Thanks > > > Edward > > > > > > > >
