[ https://issues.apache.org/jira/browse/RANGER-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670611#comment-16670611 ]
Kevin Risden commented on RANGER-1837: -------------------------------------- Sorry changed jobs :) I can take a look in the next week or so. Maybe [~quirogadf] or [~Khalid Diriye] or [~toftedahl] have some more thoughts... > Enhance Ranger Audit to HDFS to support ORC file format > ------------------------------------------------------- > > Key: RANGER-1837 > URL: https://issues.apache.org/jira/browse/RANGER-1837 > Project: Ranger > Issue Type: Improvement > Components: audit > Reporter: Kevin Risden > Assignee: Ramesh Mani > Priority: Major > Attachments: > 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support-.patch, > 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support-002.patch, > 0001-RANGER-1837-Enhance-Ranger-Audit-to-HDFS-to-support_001.patch, > AuditDataFlow.png > > > My team has done some research and found that Ranger HDFS audits are: > * Stored as JSON objects (one per line) > * Not compressed > This is currently very verbose and would benefit from compression since this > data is not frequently accessed. > From Bosco on the mailing list: > {quote}You are right, currently one of the options is saving the audits in > HDFS itself as JSON files in one folder per day. I have loaded these JSON > files from the folder into Hive as compressed ORC format. The compressed > files in ORC were less than 10% of the original size. So, it was significant > decrease in size. Also, it is easier to run analytics on the Hive tables. > > So, there are couple of ways of doing it. > > Write an Oozie job which runs every night and loads the previous day worth > audit logs into ORC or other format > Write a AuditDestination which can write into the format you want to. > > Regardless which approach you take, this would be a good feature for > Ranger.{quote} > http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)