[
https://issues.apache.org/jira/browse/RANGER-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830795#comment-15830795
]
Don Bosco Durai commented on RANGER-1310:
-----------------------------------------
[~rmani], thanks for the write up. Few feedback and suggestions:
1. Starting Ranger Audit V3 (version 3), we started using the term Queue and
Destination. This helps us to mix processing and destinations implementations
and reuse them (async_queue -> summary_queue -> multidestination -> batch_queue
--> hdfs destination)
2. AsyncQueue by nature, is asynchronous. So you will have to replace that with
something called FileQueue. I assume that is what you meant by
AuditFileCacheProvider
3. Ranger audit uses pipeline, which is a good design to support
transformations like dedup, summary, batching, etc. But it will also cause
issues with synchronous requirements. Because each queue is "Store and
Forward", with their our in-memory buffer. So we might have to give some more
thought how we want to make this requirement/design more generic and works
across.
This is my thought process. I feel, BatchQueue is what you need to replace.
BatchQueue batches events in memory and sends to the Destination (object). Each
Destination object is backed by it's own BatchQueue, so that we can support
Destination with different flow rate. BatchQueue has file spooling/backing,
when the memory gets full.
I feel, you should emulate BatchQueue in the reverse way. Write to file first
and read batch (window period) from file and call the Destination.
The challenge will be where you will place this.
A. Would we have one FileQueue per Destination or each Destination choose the
reliability level. E.g. Only HDFSDestination needs reliability
B. What is the reliability requirement/tolerance. Do you want to replace the
upfront AsyncQueue with FileQueue? And then replace BatchQueue for each
destination with FileQueue?
C. How do you handle SummaryQueue. This will depend on the reliability
requirement and will influence how many points you need FileQueue.
Seems complicated, but this is classical two-phase commit or doing read and
write in a transaction. So we might need to go through what the reliability
requirements. While 100% seems to be the obvious expectation, but in BigData we
have to ensure that we don't hold the request due to auditing. Also, with the
current design, there is enough fail safe to ensure that less than few seconds
could be lost due to extreme conditions, like both the component and the entire
destination cluster processes (except HDFS) crashed at the exact same time.
I feel, we should support option "A", where we can pick destination doesn't
have redundancy on their own and we should provide it using out reliable store.
Kafka, Solr, etc support High Availability, so we should be okay.
One more thing regarding HDFS flush. HDFS is suppose to do auto-flush on it's
own. So for your scenario, we need to see what is the difference between flush
and close (and reopen new file). If you just do flush, but not close, then
happens NameNode restarts, do we lose data?
> Ranger Audit framework enhancement to provide an option to allow audit
> records to be spooled to local disk first before sending it to destinations
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: RANGER-1310
> URL: https://issues.apache.org/jira/browse/RANGER-1310
> Project: Ranger
> Issue Type: Bug
> Reporter: Ramesh Mani
> Assignee: Ramesh Mani
>
> Ranger Audit framework enhancement to provide an option to allow audit
> records to be spooled to local disk first before sending it to destinations.
> xasecure.audit.provider.filecache.is.enabled = true ==> This will enable
> this functionality of AuditFileCacheProivder to log the audits locally in a
> file.
> xasecure.audit.provider.filecache.filespool.file.rollover.sec = \{rollover
> time - default is 1 day\} ==> this provides time to send the audit records
> from local to destination and flush the pipe.
> xasecure.audit.provider.filecache.filespool.dir=/var/log/hadoop/hdfs/audit/spool
> ==> provides the directory where the Audit FileSpool cache is present.
> This helps in avoiding missing / partial audit records in the hdfs
> destination which may happen randomly due to restart of respective plugin
> components.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)