[
https://issues.apache.org/jira/browse/RANGER-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961923#comment-15961923
]
Yan commented on RANGER-1501:
-----------------------------
[~rmani] I tried with the Ranger-1310 patch and it didn't work and experienced
lost audits without close() being called. I tried
xasecure.audit.provider.filecache.is.enabled=true or false, and the audit logs
are lost in both cases.
On the performance, I have the following comments:
1) The difference for HDFS flush() vs hflus() is that the former flushes the
buffer in the local user space to local OS socket buffer; while the latter
flushes the buffer to the DN's user space.
2) OS will flush the socket buffer based on the system configuration and/or the
resource pressure to the DN's socket/user buffer. This is done by OS threads
asynchronously.
3) If AuditHandler.flush() is invoked asynchronously, which, as for general
flush() calls for non-commit purpose, should be the case for Ranger audit log
flushing to the audit destination as pointed out in Ranger-1310. This is to
guarantee non-blocked execution for the mission-critical tasks. If for the
commit purpose, however, the flush call would need to be synchronous and the
performance impact for hflush() vs flush() would be significant.
4) Under system resource pressure, however, the non-blocked execution of
mission-critical tasks can't be guaranteed regardless of which
AuditHandler.flush() implementation approach: free resources have to be made
available before the mission-critical task can proceed. For the socket buffer
flushing, a larger buffer size will benefit more individual flush() calls but
at the cost of more resource pressure and consequently more frequent
mission-critical call blockings. In contrast, the hflush() approach requires
less socket buffer so as to make less resource pressure and less frequent
mission critical call blockings at the cost of slower individual
AuditHandler.flush() invocation, which, again, has little impact in
asynchronous execution paths.
5) Provided asynchronous invocations on both approaches, plus proper OS/Ranger
configurations, there shouldn't be fundamental performance difference between
the two.
6) The real difference between flush() and hflush() is the controllability of
flushing local buffers to the DN's buffer. For flush(), applications have
little control on the "real" flushing timing; while for hflush() applications
have direct control of this timing. This is precisely what we want here: to
ensure the data to at least reach the DNs at junctures determined by Ranger
Auditing. This is close, if not identical, to the usual flush() semantics.
Note that the HDFS flush() basically has no such guarantee and just serves the
purpose of releasing the buffers in the local user space to the local OS space,
and thus is little better than a no-op here.
7) On the other hand, hflush() should have much less performance impact than
hsync() because it does not flush all the way to DN's disks or disk caches.
8) hflush() will flush to all DNs that have replicas, typically 3 per block. So
larger clusters with more DNs does not translate to slower hflush().
> Audit Flush to HDFS does not actually cause the audit logs to be flushed to
> HDFS
> ---------------------------------------------------------------------------------
>
> Key: RANGER-1501
> URL: https://issues.apache.org/jira/browse/RANGER-1501
> Project: Ranger
> Issue Type: Bug
> Components: audit
> Affects Versions: 0.7.0
> Reporter: Yan
> Assignee: Yan
> Fix For: master
>
> Attachments:
> 0001-RANGER-1501-Audit-Flush-to-HDFS-does-not-actually-ca.patch
>
>
> The reason is that HDFS file stream's flush() call does not really flush the
> data all the way to disk, nor even makes the data visible to HDFS users. See
> the HDFS semantics of the flush/sync at
> https://issues.apache.org/jira/browse/HADOOP-6313.
> Consequently the audit logs on HDFS won't be visible/durable from HDFS client
> until the log file is closed. This will, among other issues, boost chances of
> losing audit logs in case of system failure.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)