-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/63552/
-----------------------------------------------------------
(Updated March 31, 2021, 6:01 p.m.)
Review request for ranger, Don Bosco Durai, Abhay Kulkarni, Madhan Neethiraj,
Mehul Parikh, Selvamohan Neethiraj, Sailaja Polavarapu, and Velmurugan
Periasamy.
Changes
-------
Rebased to include HFlushCapableStream check
Bugs: RANGER-1837
https://issues.apache.org/jira/browse/RANGER-1837
Repository: ranger
Description
-------
RANGER-1837:Enhance Ranger Audit to HDFS to support ORC file format
Diffs (updated)
-----
agents-audit/pom.xml b9f6af27c
agents-audit/src/main/java/org/apache/ranger/audit/destination/HDFSAuditDestination.java
5e6f40226
agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditHandler.java
4ce31dd09
agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditProviderFactory.java
6b7f4b00b
agents-audit/src/main/java/org/apache/ranger/audit/provider/AuditWriterFactory.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/provider/BaseAuditHandler.java
54f37644b
agents-audit/src/main/java/org/apache/ranger/audit/provider/DummyAuditProvider.java
05f882ff3
agents-audit/src/main/java/org/apache/ranger/audit/provider/MiscUtil.java
e2b74489b
agents-audit/src/main/java/org/apache/ranger/audit/provider/MultiDestAuditProvider.java
282f5abfa
agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileCacheProviderSpool.java
41513ba40
agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueue.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/queue/AuditFileQueueSpool.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/utils/AbstractRangerAuditWriter.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerAuditWriter.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerJSONAuditWriter.java
PRE-CREATION
agents-audit/src/main/java/org/apache/ranger/audit/utils/RangerORCAuditWriter.java
PRE-CREATION
Diff: https://reviews.apache.org/r/63552/diff/7/
Changes: https://reviews.apache.org/r/63552/diff/6-7/
Testing
-------
Testing done in local
ORC FILE FORMAT in HDFS Ranger Audit log with local audit file store as source
for HDFS audit:
NOTE: When this is done each records in the local file will be read for
creating the ORC File.
1. Enable Ranger Audit to HDFS in ORC file format using AuditFileQueue
- To enable Ranger Audit to HDFS with ORC format, we need to first
enable AuditFileQueue to spool the audit to local first.
* In Namenode host, create spool directory and make sure the path
can be read/write/execute for owner of the Service for which Ranger plugin is
enabled ( e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop
..etc)
$ mkdir -p /var/log/hadoop/audit/staging/spool
$ cd /var/log/hadoop/audit/staging/spool
$ chown hdfs:hadoop spool
* Enable AuditFileQueue via following params in
ranger-<component>-audit.xml
xasecure.audit.destination.hdfs.batch.queuetype=filequeue (NOTE:
default = memqueue which is the behaviour where a memory queue / buffer is
used instead of Local File buffer)
xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300
( This will determine the batch size for ORC file which is created)
xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool
( This is the local staging directory for audit)
xasecure.audit.destination.hdfs.batch.filequeue.filespool.buffer.size=10000 (
This will determine batch size for ORC file creation alone with rollover.sec
parameter)
2. Enable ORC fileformat for Ranger HDFS Audit.
- This is done by having the following param in
ranger-<component>-audit.xml. By default the value is "json"
xasecure.audit.destination.hdfs.filetype=orc ( default = json )
3. Provision to control the compression techniques for ORC format. Default
is 'snappy'
xasecure.audit.destination.hdfs.orc.compression=snappy|lzo|zlip|none
4. Buffer Size and Stripe Size of ORC file batch. Default is '10000' bytes
and '100000' bytes respectively. This will decide the batch size on ORC file in
hdfs.
xasecure.audit.destination.hdfs.orc.buffersize= (value in bytes)
xasecure.audit.destination.hdfs.orc.stripesize= (value in bytes)
5. Hive Query to create ORC table with default 'snappy' compresssion.
CREATE EXTERNAL TABLE ranger_audit_event (
repositoryType int,
repositoryName string,
reqUser string,
evtTime string,
accessType string,
resourcePath string,
resourceType string,
action string,
accessResult string,
agentId string,
policyId bigint,
resultReason string,
aclEnforcer string,
sessionId string,
clientType string,
clientIP string,
requestData string,
clusterName string
)
STORED AS ORC
LOCATION '/ranger/audit/hdfs'
TBLPROPERTIES ("orc.compress"="SNAPPY");
-------------------------
JSON FILE FORMAT in HDFS Ranger Audit log with local audit file store as source
for HDFS audit:
NOTE: When this is done each local file will be copied entirely into
HDFS destination. This enables us to generate Ranger audit files in HDFS which
are larger in size which is a preferred.
1. Enable Ranger Audit to HDFS in JSON file format using AuditFileQueue
- To enable Ranger Audit to HDFS with JSON format and local file
cached, we need to first enable AuditFileQueue to spool the audit to locally.
* In Namenode host, create spool directory and make sure the path
can be read/write/execute for owner of the Service for which Ranger plugin is
enabled (e.g HDFS Service it is hdfs:hadoop, Hive Service it is hive:hadoop
..etc)
$ mkdir -p /var/log/hadoop/audit/staging/spool
$ cd /var/log/hadoop/audit/staging/spool
$ chown hdfs:hadoop spool
* Enable AuditFileQueue via following params in
ranger-<component>-audit.xml
xasecure.audit.destination.hdfs.batch.queuetype=filequeue (
NOTE: default = memqueue which is the behaviour where a memory queue / buffer
is used instead of Local File buffer)
xasecure.audit.destination.hdfs.batch.filequeue.filespool.file.rollover.sec=300
( This will determine the JSON file size which will be copied to HDFS)
xasecure.audit.destination.hdfs.batch.filequeue.filespool.dir=/var/log/hadoop/audit/staging/spool
( This is the local staging directory for audit)
Thanks,
Ramesh Mani