[ 
https://issues.apache.org/jira/browse/RANGER-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225418#comment-16225418
 ] 

Ramesh Mani commented on RANGER-1837:
-------------------------------------

[~bosco] Sure, its better to have an abstract base class to have to have common 
properties and extern it to have different FileWrites ( (HDFS/Text, HDFS/ORC, 
HDFS/Parquet, HDFS/Avro, etc.) Let me refactor and send the patch for review.
[~madhan.neethiraj]
Compression Technical can be added as default, we need think of the memory 
requirements and how its going to be when the batch size is huge say in case of 
Kafka Audits.
I shall add the hive DDL for the audit schema along with the next patch.
I have attached the patch for the initial work done. Thanks.

> HDFS Audit Compression
> ----------------------
>
>                 Key: RANGER-1837
>                 URL: https://issues.apache.org/jira/browse/RANGER-1837
>             Project: Ranger
>          Issue Type: Improvement
>          Components: audit
>            Reporter: Kevin Risden
>         Attachments: RANGER-1837-HDFS-Audit-Compression_001.patch
>
>
> My team has done some research and found that Ranger HDFS audits are:
> * Stored as JSON objects (one per line)
> * Not compressed
> This is currently very verbose and would benefit from compression since this 
> data is not frequently accessed. 
> From Bosco on the mailing list:
> {quote}You are right, currently one of the options is saving the audits in 
> HDFS itself as JSON files in one folder per day. I have loaded these JSON 
> files from the folder into Hive as compressed ORC format. The compressed 
> files in ORC were less than 10% of the original size. So, it was significant 
> decrease in size. Also, it is easier to run analytics on the Hive tables.
>  
> So, there are couple of ways of doing it.
>  
> Write an Oozie job which runs every night and loads the previous day worth 
> audit logs into ORC or other format
> Write a AuditDestination which can write into the format you want to.
>  
> Regardless which approach you take, this would be a good feature for 
> Ranger.{quote}
> http://mail-archives.apache.org/mod_mbox/ranger-user/201710.mbox/%3CCAJU9nmiYzzUUX1uDEysLAcMti4iLmX7RE%3DmN2%3DdoLaaQf87njQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to