[
https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929046#comment-16929046
]
Rui Mo commented on HDFS-14740:
-------------------------------
Thanks [~rakeshr] for reviewing the patch and the valuable comments.
In [^HDFS-14740.004.patch] :
{quote}1. Please remove duplicate checks in #restoreCache() method as you
already doing the checks inside #createBlockPoolDir().
{quote}
The duplicate checks has been removed.
{quote}2. {{pmemVolume/BlockPoolId/BlockPoolId-BlockId}}. {{BlockPoolId}} is
duplicated.
{quote}
The file is named as BlockId for simplicity.
{quote}3. Can you explore the chances of using hierarchical way of storing
blocks similar to the existing datanode data.dir, this is to avoid chances of
growing blocks under one single blockPoolId. Assume cache capacity in TBs and
large set of data blocks in cache under a blockPool. Please refer
{{DatanodeUtil.idToBlockDir(finalizedDir, b.getBlockId());}}
{quote}
We {{use}} hierarchical way of cache storage referring to the implementation in
DatanodeUtil, so as to avoid storing large amount of blocks under one single
BlockPoolId.
{quote}{{4.restoreCache()}} - How about moving specific parsing/restore logic
to respective MappableBlockLoaders. PmemMappableBlockLoader#restoreCache() and
NativePmemMappableBlockLoader#restoreCache().
{quote}
We have refactored this part of implementation. restoreCache() remains in
PmemVolumeManger to restore some variables, but it calls specific
parsing/{color:#172b4d}restore logic in respective MappableBlockLoaders.
{color}
{quote}{color:#172b4d}5. {{dfs.datanode.cache.persistence.enabled}} - by
default this can be true as this will allow to get maximum capabilities of pmem
device. Overall the feature is disabled and default value of
"dfs.datanode.cache.pmem.dirs" is empty and will be DRAM based. So, once the
user enables pmem, they can utilize the potential of this device and no case of
compatibility.{color}
{quote}
{color:#172b4d}{{dfs.datanode.cache.persistence.enabled}}{color} is true by
default now. The user can enable pmem by
configuring{color:#172b4d}"dfs.datanode.cache.pmem.dirs".{color}
{color:#172b4d}Thanks!{color}
> HDFS read cache persistence support
> -----------------------------------
>
> Key: HDFS-14740
> URL: https://issues.apache.org/jira/browse/HDFS-14740
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Feilong He
> Assignee: Rui Mo
> Priority: Major
> Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch,
> HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch
>
>
> In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache
> management. Even though PM can persist cache data, for simplifying the
> initial implementation, the previous cache data will be cleaned up during
> DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking
> advantage of PM's data persistence characteristic, i.e., recovering the cache
> status when DataNode restarts, thus, cache warm up time can be saved for user.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]