[jira] [Commented] (HDFS-14740) HDFS read cache persistence support

Rui Mo (Jira) Fri, 13 Sep 2019 01:30:38 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929046#comment-16929046
 ]


Rui Mo commented on HDFS-14740:
-------------------------------

Thanks [~rakeshr] for reviewing the patch and the valuable comments.

In [^HDFS-14740.004.patch] :
{quote}1. Please remove duplicate checks in #restoreCache() method as you 
already doing the checks inside #createBlockPoolDir().
{quote}
The duplicate checks has been removed.
{quote}2. {{pmemVolume/BlockPoolId/BlockPoolId-BlockId}}. {{BlockPoolId}} is 
duplicated.
{quote}
The file is named as BlockId for simplicity.
{quote}3. Can you explore the chances of using hierarchical way of storing 
blocks similar to the existing datanode data.dir, this is to avoid chances of 
growing blocks under one single blockPoolId. Assume cache capacity in TBs and 
large set of data blocks in cache under a blockPool. Please refer 
{{DatanodeUtil.idToBlockDir(finalizedDir, b.getBlockId());}}
{quote}
We {{use}} hierarchical way of cache storage referring to the implementation in 
DatanodeUtil, so as to avoid storing large amount of blocks under one single 
BlockPoolId.
{quote}{{4.restoreCache()}} - How about moving specific parsing/restore logic 
to respective MappableBlockLoaders. PmemMappableBlockLoader#restoreCache() and 
NativePmemMappableBlockLoader#restoreCache().
{quote}
We have refactored this part of implementation. restoreCache() remains in 
PmemVolumeManger to restore some variables, but it calls specific 
parsing/{color:#172b4d}restore logic in respective MappableBlockLoaders.
{color}
{quote}{color:#172b4d}5. {{dfs.datanode.cache.persistence.enabled}} - by 
default this can be true as this will allow to get maximum capabilities of pmem 
device. Overall the feature is disabled and default value of 
"dfs.datanode.cache.pmem.dirs" is empty and will be DRAM based. So, once the 
user enables pmem, they can utilize the potential of this device and no case of 
compatibility.{color}
{quote}
 {color:#172b4d}{{dfs.datanode.cache.persistence.enabled}}{color} is true by 
default now. The user can enable pmem by 
configuring{color:#172b4d}"dfs.datanode.cache.pmem.dirs".{color}

{color:#172b4d}Thanks!{color}

> HDFS read cache persistence support
> -----------------------------------
>
>                 Key: HDFS-14740
>                 URL: https://issues.apache.org/jira/browse/HDFS-14740
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Feilong He
>            Assignee: Rui Mo
>            Priority: Major
>         Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, 
> HDFS-14740.002.patch, HDFS-14740.003.patch, HDFS-14740.004.patch
>
>
> In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache 
> management. Even though PM can persist cache data, for simplifying the 
> initial implementation, the previous cache data will be cleaned up during 
> DataNode restarts. Here, we are proposing to improve HDFS PM cache by taking 
> advantage of PM's data persistence characteristic, i.e., recovering the cache 
> status when DataNode restarts, thus, cache warm up time can be saved for user.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14740) HDFS read cache persistence support

Reply via email to