[ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798203#comment-16798203
 ] 

Feilong He commented on HDFS-14355:
-----------------------------------

[~umamaheswararao], [~rakeshr], Thank you both for your valuable comments! I 
have updated our patch according to your suggestions except one.

Feedback to Uma's above comment:
 # Datanode will clean up all the data under the configured pmem volumes. We 
will document this clearly.
 # Indeed, #verifyIfValidPmemVolume can/should be in package scope. I have 
updated.
 # For pmem, the cache loader also serves as volume manager. It would be better 
to move outside. I tend to refactor it in the next subtask.
 # Please also refer to Rakesh's above comment. I agree with you both and have 
named java based impl as PmemMappableBlockLoader. For native pmdk based impl in 
next subtask, I will name it as NativePmemMappableBlockLoader. FileMappedBlock 
has also been changed to PmemMappedBlock.
 # I think using FileChannel#transferTo is a good idea.  Its JavaDoc says: 
{color:#629755}This method is potentially much more efficient than a simple loop
{color}{color:#629755}* that reads from this channel and writes to the target 
channel.{color} *We can have more discussions on it.* cc. [~rakeshr]
 # As you pointed out, #afterCache just set the cache path after block replica 
is cached. It is evident that only pmem cache needs setting path for cached 
replica. So we use mappableBlock#afterCache to do that to avoid if-else check, 
which may be unavoidable if move replica#setCachePath into FSDataSetCache. Hope 
I have got your point.
 # As 6th item says.
 # We have a catch block to catch this exception and inside the catch block the 
IOException will be thrown again with specific exception message. Hope this is 
reasonable.

 # Agree with you. I have updated as you suggested.
 # Good insight. Currently, if a volume is full, the cache loader will not 
remove it. There are some potential improvements for volume management & 
selection strategy. I tend to optimize it in other JIRA.
 # I have updated as you suggested.
 # Good point. I have just refined that piece of code. Then, the exception will 
be thrown from #loadVolumes only if the count of valid pmem volumes is 0.
 # As I discussed with Uma offline, after considering the size of current pmem 
products, one pmem volume may not be able to cache 20k or more blocks. So in 
such level, we think it is OK to cache data into one dir as implemented 
currently.

Thanks again for so many valuable comments from [~umamaheswararao] & [~rakeshr].

Since the new patch has so many updates, I have uploaded it to this JIRA. There 
still needs some refine work. More importantly, we will discuss the impl based 
on FileChannel#transferTo.

 

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -----------------------------------------------------------------
>
>                 Key: HDFS-14355
>                 URL: https://issues.apache.org/jira/browse/HDFS-14355
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: caching, datanode
>            Reporter: Feilong He
>            Assignee: Feilong He
>            Priority: Major
>         Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to