fateh288 opened a new pull request, #250:
URL: https://github.com/apache/ranger/pull/250

   ## What changes were proposed in this pull request?
   Changes were made to fix functionality to store audit logs in orc format 
that was proposed in RANGER-1837 (https://reviews.apache.org/r/63552/diff/7/)
   
   ## How was this patch tested?
   The patch was tested for the hdfs plugin. 
   
   Following steps were carried out:
   1. In Namenode host, created spool directory and changed owner so that it 
can be read/write/execute for owner of the Service
   mkdir -p  /var/log/hdfs/audit/staging/spool
   cd /var/log/hdfs/audit/staging
   chown hdfs:hadoop spool
   
   Enabled AuditFileQueue via following params in ranger-hdfs-audit.xml
   xasecure.audit.destination.hdfs.batch.queuetype=filequeue
   
xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300
   
xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hdfs/audit/staging/spool
   xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000
   
   2.Enable ORC file format for Ranger HDFS Audit in ranger-hdfs-audit.xml. 
   xasecure.audit.destination.hdfs.filetype=orc
   
   3. Provision to control the compression techniques for ORC format in 
ranger-hdfs-audit.xml.
   xasecure.audit.destination.hdfs.orc.compression=snappy
   
   4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes 
and '100000' bytes respectively.
   xasecure.audit.destination.hdfs.orc.buffersize=10000
   xasecure.audit.destination.hdfs.orc.stripesize=100000
   
   5. Add ORC jars to plugin path: 
   Plugins have orc-core, orc-shims and aircompressor dependencies missing. 
Manually added simlinks to the plugin classpath
   Work in progress to add these dependencies to the distro instead of manually 
adding it to ranger-hdfs-plugin-impl:
   
   cd path/hadoop/lib/ranger-hdfs-plugin-impl
   ln -s jar_location/jars/orc-core-1.7.6.jar .
   ln -s jar_location/jars/orc-shims-1.7.6.jar .
   ln -s jar_location/jars/aircompressor-0.10.jar .
   
   6. Restarted using hdfs stale config
   
   7. Verify by creating hive table from orc data… 
   
   CREATE EXTERNAL TABLE ranger_audit_event_new(
   repoType int,
   repo string,
   reqUser string,
   evtTime string,
   access string,
   resource string,
   resType string,
   action string,
   result int,
   agent string,
   policy int,
   reason string,
   enforcer string,
   cliIP string,
   agentHost string,
   logType string,
   id string,
   seq_num int,
   event_count int,
   event_dur_ms int,
   tags string,
   additional_info string,
   cluster_name string
   )
   STORED AS ORC
   LOCATION '/ranger/audit/hdfs/hdfs/20230414'
   TBLPROPERTIES ("orc.compress"="SNAPPY");
   
   8. select query displays audit log data stored in orc format correctly. 
   select * from ranger_audit_event_new;
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to