[ 
https://issues.apache.org/jira/browse/RANGER-1310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15830795#comment-15830795
 ] 

Don Bosco Durai commented on RANGER-1310:
-----------------------------------------

[~rmani], thanks for the write up. Few feedback and suggestions:

1. Starting Ranger Audit V3 (version 3), we started using the term Queue and 
Destination. This helps us to mix processing and destinations implementations 
and reuse them (async_queue -> summary_queue -> multidestination -> batch_queue 
--> hdfs destination)
2. AsyncQueue by nature, is asynchronous. So you will have to replace that with 
something called FileQueue. I assume that is what you meant by 
AuditFileCacheProvider
3. Ranger audit uses pipeline, which is a good design to support 
transformations like dedup, summary, batching, etc. But it will also cause 
issues with synchronous requirements. Because each queue is "Store and 
Forward", with their our in-memory buffer. So we might have to give some more 
thought how we want to make this requirement/design more generic and works 
across.

This is my thought process. I feel, BatchQueue is what you need to replace. 
BatchQueue batches events in memory and sends to the Destination (object). Each 
Destination object is backed by it's own BatchQueue, so that we can support 
Destination with different flow rate. BatchQueue has file spooling/backing, 
when the memory gets full.

I feel, you should emulate BatchQueue in the reverse way. Write to file first 
and read batch (window period) from file and call the Destination.

The challenge will be where you will place this.
A. Would we have one FileQueue per Destination or each Destination choose the 
reliability level. E.g. Only HDFSDestination needs reliability
B. What is the reliability requirement/tolerance. Do you want to replace the 
upfront AsyncQueue with FileQueue? And then replace BatchQueue for each 
destination with FileQueue?
C. How do you handle SummaryQueue. This will depend on the reliability 
requirement and will influence how many points you need FileQueue.

Seems complicated, but this is classical two-phase commit or doing read and 
write in a transaction. So we might need to go through what the reliability 
requirements. While 100% seems to be the obvious expectation, but in BigData we 
have to ensure that we don't hold the request due to auditing. Also, with the 
current design, there is enough fail safe to ensure that less than few seconds 
could be lost due to extreme conditions, like both the component and the entire 
destination cluster processes (except HDFS) crashed at the exact same time. 

I feel, we should support option "A", where we can pick destination doesn't 
have redundancy on their own and we should provide it using out reliable store. 
Kafka, Solr, etc support High Availability, so we should be okay.

One more thing regarding HDFS flush. HDFS is suppose to do auto-flush on it's 
own. So for your scenario, we need to see what is the difference between flush 
and close (and reopen new file). If you just do flush, but not close, then 
happens NameNode restarts, do we lose data?



> Ranger Audit framework enhancement to provide an option to  allow audit 
> records to be spooled to local disk first before sending it to destinations
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: RANGER-1310
>                 URL: https://issues.apache.org/jira/browse/RANGER-1310
>             Project: Ranger
>          Issue Type: Bug
>            Reporter: Ramesh Mani
>            Assignee: Ramesh Mani
>
> Ranger Audit framework enhancement to provide an option to allow audit 
> records to be spooled to local disk first before sending it to destinations. 
> xasecure.audit.provider.filecache.is.enabled = true ==>  This will enable 
> this functionality of AuditFileCacheProivder to log the audits locally in a 
> file.
> xasecure.audit.provider.filecache.filespool.file.rollover.sec = \{rollover 
> time - default is 1 day\} ==> this provides time to send the audit records 
> from local to destination and flush the pipe. 
> xasecure.audit.provider.filecache.filespool.dir=/var/log/hadoop/hdfs/audit/spool
>  ==> provides the directory where the Audit FileSpool cache is present.
> This helps in avoiding missing / partial audit records in the hdfs 
> destination which may happen randomly due to restart of respective plugin 
> components. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to