[jira] [Commented] (RANGER-1501) Audit Flush to HDFS does not actually cause the audit logs to be flushed to HDFS

Yan (JIRA) Sat, 08 Apr 2017 12:44:00 -0700

    [ 
https://issues.apache.org/jira/browse/RANGER-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961923#comment-15961923
 ]


Yan commented on RANGER-1501:
-----------------------------

[~rmani] I tried with the Ranger-1310 patch and it didn't work and experienced 
lost audits without close() being called. I tried 
xasecure.audit.provider.filecache.is.enabled=true or false, and the audit logs 
are lost in both cases.

On the performance, I have the following comments:

1) The difference for HDFS flush() vs hflus() is that the former flushes the 
buffer in the local user space to local OS socket buffer; while the latter 
flushes the buffer to the DN's user space.
2) OS will flush the socket buffer based on the system configuration and/or the 
resource pressure to the DN's socket/user buffer. This is done by OS threads 
asynchronously.
3) If AuditHandler.flush() is invoked asynchronously, which, as for general 
flush() calls for non-commit purpose, should be the case for Ranger audit log 
flushing to the audit destination as pointed out in Ranger-1310. This is to 
guarantee non-blocked execution for the mission-critical tasks. If for the 
commit purpose, however, the flush call would need to be synchronous and the 
performance impact for hflush() vs flush() would be significant.
4) Under system resource pressure, however, the non-blocked execution of 
mission-critical tasks can't be guaranteed regardless of which 
AuditHandler.flush() implementation approach: free resources have to be made 
available before the mission-critical task can proceed. For the socket buffer 
flushing, a larger buffer size will benefit more individual flush() calls but 
at the cost of more resource pressure and consequently more frequent 
mission-critical call blockings. In contrast, the hflush() approach requires 
less socket buffer so as to make less resource pressure and less frequent 
mission critical call blockings at the cost of slower individual 
AuditHandler.flush() invocation, which, again, has little impact in 
asynchronous execution paths.
5) Provided asynchronous invocations on both approaches, plus proper OS/Ranger 
configurations, there shouldn't be fundamental performance difference between 
the two.
6) The real difference between flush() and hflush() is the controllability of 
flushing local buffers to the DN's buffer. For flush(), applications have 
little control on the "real" flushing timing; while for hflush() applications 
have direct control of this timing. This is precisely what we want here: to 
ensure the data to at least reach the DNs at junctures determined by Ranger 
Auditing. This is close, if not identical, to the usual flush() semantics.  
Note that the HDFS flush() basically has no such guarantee and just serves the 
purpose of releasing the buffers in the local user space to the local OS space, 
and thus is little better than a no-op here.
7) On the other hand, hflush() should have much less performance impact than 
hsync() because it does not flush all the way to DN's disks or disk caches.
8) hflush() will flush to all DNs that have replicas, typically 3 per block. So 
larger clusters with more DNs does not translate to slower hflush().

> Audit Flush to HDFS does not actually cause the audit logs to be flushed to 
> HDFS 
> ---------------------------------------------------------------------------------
>
>                 Key: RANGER-1501
>                 URL: https://issues.apache.org/jira/browse/RANGER-1501
>             Project: Ranger
>          Issue Type: Bug
>          Components: audit
>    Affects Versions: 0.7.0
>            Reporter: Yan
>            Assignee: Yan
>             Fix For: master
>
>         Attachments: 
> 0001-RANGER-1501-Audit-Flush-to-HDFS-does-not-actually-ca.patch
>
>
> The reason is that HDFS file stream's flush() call does not really flush the 
> data all the way to disk, nor even makes the data visible to HDFS users. See 
> the HDFS semantics of the flush/sync at 
> https://issues.apache.org/jira/browse/HADOOP-6313.
> Consequently the audit logs on HDFS won't be visible/durable from HDFS client 
> until the log file is closed. This will, among other issues, boost chances of 
> losing audit logs in case of system failure.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (RANGER-1501) Audit Flush to HDFS does not actually cause the audit logs to be flushed to HDFS

Reply via email to