[
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797627#comment-16797627
]
Uma Maheswara Rao G commented on HDFS-14355:
--------------------------------------------
[~PhiloHe] Thanks for working on this. Thanks [~rakeshr] for thorough reviews.
I have few more comments to address.
# // Remove all files under the volume.
FileUtils.cleanDirectory(locFile);
is it safe to clean the existing files?
# public static void verifyIfValidPmemVolume(File pmemDir)
Why this is public? package scope does not work?
# FileMappableBlockLoader: currently this implementation is handling more than
block loading. Ex: volume management. Probably you may want to consider having
PmemVolumeManager class which can manage all this volumes and refification etc?
I am ok if you want to take up this in other JIRA. Similar/same volume
management may be needed in HDFS-14356 as well. Considering that, it worth
moving to common place. What do you say?
# FileMappableBlockLoader: Actually this class implementing the block loading
for pmem. So, should this name say
PmemFileMappableBlockLoader/PmemMappableBlockLoader?
HDFS-14356 impl may name that implementation class name as
NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make
sense ? cc [~rakeshr]
# FileMappableBlockLoader : Currently you are reading data from input stream
and verifying checksum and writing that buffer to MBB.
One thought here is: how about using FileChannel#transferTo API for
transferring data from one channel to other natively. and then do mmap on
destination file(assuming mmap may be faster in target file) and do checksum
verification on it?
I have not checked this here, just checking whether gives any further
improvemeny as we are avoiding rading from HDD to user space.
# FileMappedBlock: afterCache is basically getting block replica and setting
the path. And this method used in FsDatasetCache. Why dont to expose required
info from MappedBlock, that is key and filePath and call directly
replica#setCachePath in FSDataSetCache?
# ReplicaInfo replica = dataset.getBlockReplica(key.getBlockPoolId(),
key.getBlockId());
replica.setCachePath(filePath);
# if (!result) {
throw new IOException();
}
Please add message here, possible reason for failure etc.
# Config Description: "Currently, there are two cache loaders implemented:"
I don't think you need to mention exact number of loaders (1/2/3), instead you
can say Current available cacheLoaders are x, y. So you need not update this
number when added new loader, just add that loader next to existing ones.
# What happens if no space available on after volume filled with cached
blocks? do you still choose that volume? do you need to update count and
removed that volume for further writes?
# Javadoc comment: "TODO: Refine location selection policy by considering
storage utilization"
Add this TODO at getOneLocation API as thats the relavent place. I still dont
think this valume management should be done by Loaders.
# private void loadVolumes(String[] volumes): even one volume bad you would
throw exception? or you have plans to skip that volume? I am ok if you would
like to consider in next JIRAs. DN skips the bad volumes and proceed IIRC.
# filePath = getOneLocation() + "/" + key.getBlockPoolId() + "-" +
key.getBlockId();
This may be a problem when you cache more number of blocks? they all goes in
same directory. DN creates subdirs in regular volumes.
is there any issue if we maintain the same path structure in pmem volume? that
way you can also avoid storing that cached path replica?
Thanks [~PhiloHe] for the hard work you are doing on this.
> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -----------------------------------------------------------------
>
> Key: HDFS-14355
> URL: https://issues.apache.org/jira/browse/HDFS-14355
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: caching, datanode
> Reporter: Feilong He
> Assignee: Feilong He
> Priority: Major
> Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch,
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support
> isn't available or convenient in some environments or platforms.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]