[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800514#comment-16800514
 ] 

Feilong He commented on HDFS-14355:
-----------------------------------

[~daryn], thanks so much for your pretty valuable comments.
{quote}I quickly skimmed the patch. At a high level, a plugin design should not 
leak details. This patch breaks the abstraction and litters the code with 
references to "pmem" and explicit conditionals. The details of pmem should be 
pushed down and hidden in the pmem specific classes.
{quote}
We tried to separate our impl for pmem cache with the current HDFS code. We 
introduced some classes  for pmem such as PmemMappableBlockLoader, 
PmemMappedBlock, PmemVolumeManager. In FsdatasetCache, we indeed kept some 
references for pmem, for example pmemUsedBytesCount, which may be one issue you 
pointed out implicitly. In our new patch, pmemUsedBytesCount and 
reserve/release methods for pmem will be removed from FsdatasetCache to a new 
class PmemVolumeManager. We are trying to shade such unnecessarily exposed 
details for pmem as you suggested.
{quote}Adding another reference (cacheFilePath) to {{ReplicaInfo}} is less than 
desirable from a memory perspective. For those not using the feature, it's not 
nearly as bad as adding one to inodes but may be an issue for very dense DNs. 
More importantly, those that DO use the feature will incur a substantial memory 
hit to store the paths. Why does a SCM block need to track a completely 
different path? Isn't the SCM just another storage dir?
{quote}
The cacheFilePath in ReplicaInfo is just used for pmem cache. Since multiple 
pmems can be configured by user, to locate the cache file on pmem, it is 
necessary to keep such info for a cached block. Also, we seriously considered 
your reasonable concerns that adding cacheFilePath there may cause issues for 
very Dense DNs. Thanks for pointing out this. We will optimize this part with a 
new patch. In the new patch, we will remove cacheFilePath from ReplicaInfo. 
Instead, we will add a PmemVolumeManager for pmem cache to keep 
cacheFilePath(precisely, pmem volume index, which is enough for inferring the 
cache file path in our new impl). TB sized HDFS pmem cache should be able to 
cache around 10k blocks at most. In our evaluation, pmem cache would not 
consume substantial DRAM space to maintain which pmem volume a block is cached 
to for 10k level blocks. On the other hand, enabling pmem cache can alleviate 
the pressure of competing DRAM resource.
{quote}{{getCacheLoaderClaZZ}} should be {{getCacheLoaderClaSS}} and the 
returned {{Class}} must be instantiated. It's wrong to compare the class simple 
name against {{MemoryMappableBlockLoader}}. Consider if a user configures with 
{{my.custom.MemoryMappableBlockLoader}} and it instantiates 
{{org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader}}
 anyway because of the matching simple name? Catching 
{{ReflectiveOperationException}} and just logging it masks a serious 
misconfiguration or error condition.
{quote}
The {{getCacheLoaderClaZZ}} has been changed to {{getCacheLoaderClass }}as you 
suggested. Your suggestion makes us aware of that comparing class simple name 
is inconsiderate. We have fixed this issue in our new patch. Since the 
constructor of cache loader requires a {{FsdatasetCache}} instance as its 
parameter, we still instantiate it there as you noted. The 
{{getCacheLoaderClass}} comes from DNconf and it should not depend on 
{{FsdatasetCache}} to return a instantiated cache loader. As you pointed out, 
it is not reasonable to catch {{ReflectiveOperationException and }}merely log 
it. We throw a RuntimeException inside the catch block to terminate the start 
of DN.
{quote}There's quite a few other exceptions being swallowed or just logged when 
dubious "bad things" happen. Or discarding of exceptions and rethrowing generic 
"something went wrong" exceptions w/o the original exception as a cause. That 
complicates debugging.
{quote}
We are checking the code scrupulously and trying our best to fix the issues you 
mentioned to avoid complicating debugging. For loading pmem volume, we just log 
the error if one volume is not usable. We thought it is tolerable to have just 
one bad pmem volume configured by user. But an exception will be thrown to 
terminate the DN starting if all pmem volumes are invalid.
{quote}New methods in {{FsDatasetUtil}} are not ok. Invalid arguments are not 
"ignorable". That's how bugs creep in which are insanely hard to debug. Don't 
check if offset is valid, just seek, let it throw if invalid. Don't ignore null 
filenames, just let it throw. Catching {{Throwable}} is a bad practice; let 
alone catching, discarding, and throwing a bland exception.
{quote}
Good suggestion! We noted the issues in {{FsDatasetUtil}}. As you pointed out, 
the invalid offset and null filenames shouldn't be ignored. We are also 
checking and fixing such issue in other piece of code. 
{quote}Why throw a {{RuntimeException}} in "getOneLocation"? Is the intent to 
take down the DN?
{quote}
Indeed, it is not reasonable to throw RuntimeException in #getOneLocation. It 
is replaced by IOException in our new patch. Then this exception will just lead 
to a typical cache failure and will not impact on the DN working.
{quote}Why do a costly compute of the checksum for a mmap file? Isn't the 
client is going to compute the checksum anyway?
{quote}
As you may know, the current implementation of HDFS DRAM cache does checksum 
when caching data to DRAM. In our implementation, Pmem cache still does 
checksum to guarantee the reliability of cached data, which also aligns with 
the implementation of HDFS DRAM cache.

 

Thank you again for your valuable comments. We are still refining the code 
based on your comments. After that, a new patch will be uploaded.

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -----------------------------------------------------------------
>
>                 Key: HDFS-14355
>                 URL: https://issues.apache.org/jira/browse/HDFS-14355
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: caching, datanode
>            Reporter: Feilong He
>            Assignee: Feilong He
>            Priority: Major
>         Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to