[
https://issues.apache.org/jira/browse/HDFS-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155818#comment-14155818
]
Chris Nauroth commented on HDFS-6919:
-------------------------------------
I've always thought of cache pools as an abstraction over {{ulimit -l}}. The
cache pool defines permissions and an upper limit on the number of bytes that
can be locked into memory via cache directives in that cache pool. When an
admin creates a cache pool of a certain size and grants access to a set of
users, it's analogous to using {{ulimit}} to restrict those users' hard limit
for locked memory.
In-memory replica writes behave a lot like virtual memory. Clients simply
write, and if lazy-persist is enabled, then the DataNode has the freedom to
buffer any written amount in memory. This is done on a best-effort basis
though, and contention on RAM would trigger fallback to disk for some portion
of in-memory data. This is analogous to paging. Since the RAM-vs.-disk
distinction is largely abstracted away from the client writer, this gives us a
lot of freedom to evolve smarter cache eviction policies over time.
Arpit also has pointed out that a single writer can write over 1.5GB to memory
in 3 seconds, and this may improve as we find optimizations in the write
pipeline. With multiple concurrent writers, it's possible that RAM disk usage
could change entirely within a heartbeat interval. This conflicts with cache
pool enforcement, which happens centrally at the NameNode and is subject to the
latency of the heartbeat interval.
Considering the above factors, I don't see a beneficial way to apply cache
pools to in-memory replica writes. Using a single cache pool named lazyPersist
would sidestep one of the main features of cache pools: the ability to control
different limits for different users. Doing something more sophisticated, such
as trying to match existing cache directives to paths of in-memory replica
writes, implies a need for tighter coupling between NameNode and DataNode to
pass cache pool information around (currently encapsulated at the NameNode).
It's possible that none of this enforcement would be effective, considering the
latency of the heartbeat interval. Conceptually, users might find cache pools
confusing here, since there is no analogous OS knob that enforces a quota on
what portion of a traditional OS file descriptor write goes to buffer cache.
> Enforce a single limit for RAM disk usage and replicas cached via locking
> -------------------------------------------------------------------------
>
> Key: HDFS-6919
> URL: https://issues.apache.org/jira/browse/HDFS-6919
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Arpit Agarwal
> Assignee: Colin Patrick McCabe
> Priority: Blocker
>
> The DataNode can have a single limit for memory usage which applies to both
> replicas cached via CCM and replicas on RAM disk.
> See comments
> [1|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106025&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106025],
>
> [2|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106245&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106245]
> and
> [3|https://issues.apache.org/jira/browse/HDFS-6581?focusedCommentId=14106575&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14106575]
> for discussion.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)