Re: Architecture improvement discussion

Sidharth Kumar Mon, 18 Feb 2019 06:05:48 -0800

Hi Edward,

In my current project we are using Prometheus for system/service monitoring
for 200+ servers and we have issues when it comes to scalability.


What about Apache drill?

Warm Regards

Sidharth Kumar | Mob: +91 8197 555 599

On Mon, 18 Feb, 2019, 3:18 PM Zhao, Qingwen <[email protected]
wrote:

> Got it. I agree with your idea.
> I have used Prometheus for a while in another project, and it's very easy
> to use and maintain it.
>
> Thanks,
> Qingwen
>
> On 2018/12/24, 1:54 PM, "Edward Zhang" <[email protected]> wrote:
>
>     Qingwen,
>
>     There is no new architecture yet, it is just a very initial discussion
> :-)
>
>     HBase is mainly used for job performance monitoring where mapreduce
>     job/task data are stored. As long as Eagle supports customized job
>     performance monitoring, the storage has be there although our data
> model is
>     pretty agnostic to storage implementation.
>
>     For metrics, Eagle 0.5 actually does not store metrics and only process
>     them in streaming mode. My suggestion is we use mature tools like
>     Prometheus to store and visualize metrics while Eagle focuses on policy
>     evaluation.
>
>     Thanks
>     Edward
>
>     On Thu, Oct 25, 2018 at 3:00 AM Zhao Qingwen <[email protected]>
> wrote:
>
>     > Hi Edward,
>     >
>     > In the new architecture, is the storage(hbase) taken off?
>     > How do the adaptors store the data? for example, hadoop namenode
> metrics.
>     >
>     > Best Regards,
>     > Qingwen Zhao | 赵晴雯
>     >
>     >
>     >
>     >
>     >
>     > Edward Zhang <[email protected]> 于2018年10月12日周五 上午10:49写道：
>     >
>     > > Hi Eaglers,
>     > >
>     > > I would like to start some discussion about architecture
> improvement for
>     > > Apache Eagle based on community experience and feedback. The
> improvement
>     > is
>     > > targeted for simplifying installation and development of Apache
> Eagle.
>     > >
>     > > Eagle's main responsibility is to report abnormality instantly by
>     > applying
>     > > policies on streaming data. Eagle consists of two major components,
>     > Policy
>     > > Engine and Adaptors. Policy Engine is a standalone application
> which
>     > > provides REST API to manage policy lifecycle for different data
> sources
>     > and
>     > > provides runtime to evaluate policy on streaming data.  Adaptors
> are
>     > those
>     > > applications which fetch/process log/metrics from outside and send
> data
>     > to
>     > > policy engine for alerting purpose.
>     > >
>     > > But right now in Eagle code base, it is not clearly focusing on
> the two
>     > > components. For example the current source code includes map/reduce
>     > > job/task log retrieval/cleanup/analysis which is very useful but
> probably
>     > > Eagle only needs the portion of data retrieval/cleanup part and so
> data
>     > can
>     > > be streamed into policy engine for alerting purpose. For job/task
>     > analysis
>     > > part, it can be maintained in other project.
>     > >
>     > > First let me list the main modules Eagle source code consists of.
>     > > - eagle core
>     > >     - policy engine (coordinator, runtime, and web)
>     > >     - monitor application management
>     > >     - eagle query framework - for querying time series data from
> hbase
>     > > - eagle adaptors
>     > >      - gc log fetch/processing and alerting
>     > >      - metric fetch/processing and alerting, including name node,
> data
>     > > node, hbase etc.
>     > >      - jpm: job performance management.
>     > >             - haoop yarn queue statistics fetch/processing
>     > >             - hadoop mapreduce history job log processing
>     > >             - hadoop mapreduce running job processing
>     > >             - spark history job log processing
>     > >             - spark running job processing
>     > >             - jpm web application
>     > >             - hadoop job analyzer
>     > >        - security monitoring
>     > >              - hdfs audit log fetch/processing
>     > >              - hdfs auth log fetch/processing
>     > >              - hbase audit log fetch/processing
>     > >              - hive log fetch/processing
>     > >              - maprfs audit log fetch/processing
>     > >              - oozie audit log fetch/processing
>     > >         - hadoop topology stats fetch/processing
>     > > - eagle server
>     > >
>     > > It is very obvious that it is not scale for Eagle community to
> maintain
>     > so
>     > > large amount of monitoring adaptors especially when Hadoop/Spark
> versions
>     > > are evolving pretty fast.
>     > >
>     > > My suggestion is Eagle ONLY focus on policy engine and some default
>     > > important adaptors but remove/separate some unrelated
> functionalities.
>     > For
>     > > policy engine, it would be nice if it can run on popular streaming
> engine
>     > > besides Apache Storm so that it can be easily deployed for
> community
>     > users.
>     > > For default important adaptors, I may suggest Eagle have ONLY HDFS
> audit
>     > > log, Hadoop running job, Spark running job, HDFS namenode metrics
> etc.
>     > For
>     > > unrelated functionalities, we can either remove them from Eagle
> code base
>     > > or separate them into standalone executable if that is still really
>     > needed
>     > > under Apache Eagle monitoring umbrella by community.
>     > >
>     > > So the proposed Eagle code base would be like:
>     > > - policy engine
>     > >      - coordinator
>     > >      - runtime
>     > >      - web
>     > > - adaptors
>     > >     - hdfs audit log
>     > >     - Hadoop running job
>     > >     - Spark running job
>     > >     - HDFS namenode metrics
>     > >     - Hadoop yarn queue metrics
>     > > - extensions (some non default adaptors contributed by community)
>     > > - executables (standalone executables which are legacy)
>     > >
>     > > It would be great if you can provide more feedback on this
> discussion.
>     > >
>     > > (By the way, I also had a lot of discussion with Hao, Chen, Eagle
> PMC
>     > > member and core developer about this topic based on his experience
> of
>     > > engaging Eagle users.)
>     > >
>     > > Thanks
>     > > Edward
>     > >
>     >
>
>
>

Re: Architecture improvement discussion

Reply via email to