[ 
https://issues.apache.org/jira/browse/EAGLE-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15010593#comment-15010593
 ] 

Su Ralph commented on EAGLE-2:
------------------------------

[~libsun] [~yonzhang2012] I think use API to get offset would be better than 
SSH and issue command line on the target machine. Because
1. We should not take the assumption that the host is ssh-enabled.
2. Even it's enabled, the handling of the SSH permission would be tricky and 
might cause security issue.
3. Handling the ssh command line output could be messy if things get 
complicated(though it should be simple here). 
We should stand on programmable friendly api as much as possible.

Checking 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-TheAPIs
and
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

/// refer of the design page
2. Get total offset of kafka topic using kafka-run-class.sh
    (bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list xxx 
--topic xxx--time -1)
    ssh to host, exec shell and get output
    A good ssh library under apache license 2.0: SSHJ

> watch message process backlog in Eagle UI
> -----------------------------------------
>
>                 Key: EAGLE-2
>                 URL: https://issues.apache.org/jira/browse/EAGLE-2
>             Project: Eagle
>          Issue Type: Improvement
>         Environment: production
>            Reporter: Edward Zhang
>            Assignee: Libin, Sun
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Message latency is a key factor for Eagle to enable realtime security 
> monitoring. For hdfs audit log monitoring, kafka is used as datasource. So 
> there is always some gap between current max offset in kafka and processed 
> offset in eagle. The gap is the backlog which eagle should consume quickly as 
> much as quickly. If the gap can be sampled for every minute or 20 seconds, 
> then we understand if eagle is catching up or is lagging behind more.
> The command to get current max offset in kafka is 
> bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list xxxx --topic 
> hdfs_audit_log --time -1
> and Storm-kafka spout would store processed offset in zookeeper, in the 
> following znode:
> /consumers/hdfs_audit_log/eagle.hdfsaudit.consumer/partition_0 
> So technically we can get the gap and write that to eagle service then in UI 
> we can watch the backlog



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to