[
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16798203#comment-16798203
]
Feilong He commented on HDFS-14355:
-----------------------------------
[~umamaheswararao], [~rakeshr], Thank you both for your valuable comments! I
have updated our patch according to your suggestions except one.
Feedback to Uma's above comment:
# Datanode will clean up all the data under the configured pmem volumes. We
will document this clearly.
# Indeed, #verifyIfValidPmemVolume can/should be in package scope. I have
updated.
# For pmem, the cache loader also serves as volume manager. It would be better
to move outside. I tend to refactor it in the next subtask.
# Please also refer to Rakesh's above comment. I agree with you both and have
named java based impl as PmemMappableBlockLoader. For native pmdk based impl in
next subtask, I will name it as NativePmemMappableBlockLoader. FileMappedBlock
has also been changed to PmemMappedBlock.
# I think using FileChannel#transferTo is a good idea. Its JavaDoc says:
{color:#629755}This method is potentially much more efficient than a simple loop
{color}{color:#629755}* that reads from this channel and writes to the target
channel.{color} *We can have more discussions on it.* cc. [~rakeshr]
# As you pointed out, #afterCache just set the cache path after block replica
is cached. It is evident that only pmem cache needs setting path for cached
replica. So we use mappableBlock#afterCache to do that to avoid if-else check,
which may be unavoidable if move replica#setCachePath into FSDataSetCache. Hope
I have got your point.
# As 6th item says.
# We have a catch block to catch this exception and inside the catch block the
IOException will be thrown again with specific exception message. Hope this is
reasonable.
# Agree with you. I have updated as you suggested.
# Good insight. Currently, if a volume is full, the cache loader will not
remove it. There are some potential improvements for volume management &
selection strategy. I tend to optimize it in other JIRA.
# I have updated as you suggested.
# Good point. I have just refined that piece of code. Then, the exception will
be thrown from #loadVolumes only if the count of valid pmem volumes is 0.
# As I discussed with Uma offline, after considering the size of current pmem
products, one pmem volume may not be able to cache 20k or more blocks. So in
such level, we think it is OK to cache data into one dir as implemented
currently.
Thanks again for so many valuable comments from [~umamaheswararao] & [~rakeshr].
Since the new patch has so many updates, I have uploaded it to this JIRA. There
still needs some refine work. More importantly, we will discuss the impl based
on FileChannel#transferTo.
> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -----------------------------------------------------------------
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: caching, datanode
> Reporter: Feilong He
> Assignee: Feilong He
> Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch,
> HDFS-14355.002.patch, HDFS-14355.003.patch
>
>
> This task is to implement the caching to persistent memory using pure
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support
> isn't available or convenient in some environments or platforms.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]