I have some spare time and was planning to work on this. If no one currently looking into this JIRA, then can you assign it to me?
https://issues.apache.org/jira/browse/EAGLE-59 Thanks Bosco On 11/29/15, 8:43 PM, "Don Bosco Durai" <[email protected]> wrote: Edward Thanks. I will look into HdfsAuditLogProcessorMain class. I will upload the sample files in the JIRA. Thanks Bosco On 11/29/15, 7:56 PM, "Zhang, Edward (GDI Hadoop)" <[email protected]> wrote: >One more thing, Bosco, could you please copy some sample hdfs audit log, >hbase log and hive log to here? > >I realize with Ranger data source, we probably still need some minor code >development as follows >1. Substitute existing eagle data source(raw hdfs audit log) with Ranger >data source, for example, in HdfsAuditLogProcessorMain, modify the code to >use different log deserializer >2. Ensure output of Ranger log deserializer is compatible to existing >eagle data source. > >With the above code change, we can automatically get all capabilities like >sensitivity data join, user hadoop command reassembly, hive query >semantics parsing etc. > >Thanks >Edward Zhang > >On 11/29/15, 18:52, "Zhang, Edward (GDI Hadoop)" <[email protected]> wrote: > >>Hi Bosco, >> >>Thanks for creating this ticket. It is very helpful if EAGLE can use >>Ranger as data source and automatically get monitoring capability in 9 >>Hadoop components. >> >>If a datasource is not from Kafka, and needs a lot of pre-processing, it >>is not trivial to integrate that data source. >> >>Ranger¹s data source should be uniform in syntax and the integration >>should be straightforward, if we have a uniform deserializer. >> >>I think we can document the steps of integrating a new datasource. >> >>Thanks >>Edward Zhang >> >>On 11/29/15, 12:00, "Don Bosco Durai" <[email protected]> wrote: >> >>>Hi Eagle team >>> >>>I am excited to see all the activities on this project. I have created a >>>JIRA (https://issues.apache.org/jira/browse/EAGLE-59) to track the >>>integration with Apache Ranger. >>> >>>One way to integrate is for Ranger to send the audit logs in the same way >>>as native log format to Kafka. However, Ranger already is doing the >>>normalization of the audit format for all the components. So >>>reconstructing might not be a good way to go. >>> >>>I am still getting familiar with the internals of Apache Eagle, but if >>>someone can help me or document how a 3rd party source can be integrated >>>with Apache Eagle, then it will be great. Also, what is the change >>>required on the analytics side to support new data sources? E.g. If we >>>integrate with Ranger Audit Logs, we would get audit logs from around 9 >>>components right away. How can we use it? >>> >>>If you are okay, I am willing to work on this JIRA. >>> >>>Thanks >>> >>>Bosco >>> >>> >> >
