One more thing, Bosco, could you please copy some sample hdfs audit log, hbase log and hive log to here?
I realize with Ranger data source, we probably still need some minor code development as follows 1. Substitute existing eagle data source(raw hdfs audit log) with Ranger data source, for example, in HdfsAuditLogProcessorMain, modify the code to use different log deserializer 2. Ensure output of Ranger log deserializer is compatible to existing eagle data source. With the above code change, we can automatically get all capabilities like sensitivity data join, user hadoop command reassembly, hive query semantics parsing etc. Thanks Edward Zhang On 11/29/15, 18:52, "Zhang, Edward (GDI Hadoop)" <[email protected]> wrote: >Hi Bosco, > >Thanks for creating this ticket. It is very helpful if EAGLE can use >Ranger as data source and automatically get monitoring capability in 9 >Hadoop components. > >If a datasource is not from Kafka, and needs a lot of pre-processing, it >is not trivial to integrate that data source. > >Ranger¹s data source should be uniform in syntax and the integration >should be straightforward, if we have a uniform deserializer. > >I think we can document the steps of integrating a new datasource. > >Thanks >Edward Zhang > >On 11/29/15, 12:00, "Don Bosco Durai" <[email protected]> wrote: > >>Hi Eagle team >> >>I am excited to see all the activities on this project. I have created a >>JIRA (https://issues.apache.org/jira/browse/EAGLE-59) to track the >>integration with Apache Ranger. >> >>One way to integrate is for Ranger to send the audit logs in the same way >>as native log format to Kafka. However, Ranger already is doing the >>normalization of the audit format for all the components. So >>reconstructing might not be a good way to go. >> >>I am still getting familiar with the internals of Apache Eagle, but if >>someone can help me or document how a 3rd party source can be integrated >>with Apache Eagle, then it will be great. Also, what is the change >>required on the analytics side to support new data sources? E.g. If we >>integrate with Ranger Audit Logs, we would get audit logs from around 9 >>components right away. How can we use it? >> >>If you are okay, I am willing to work on this JIRA. >> >>Thanks >> >>Bosco >> >> >
