[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

Uma Maheswara Rao G (JIRA) Wed, 20 Mar 2019 15:49:26 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797627#comment-16797627
 ]


Uma Maheswara Rao G commented on HDFS-14355:
--------------------------------------------

[~PhiloHe] Thanks for working on this. Thanks [~rakeshr] for thorough reviews.

I have few more comments to address.
 # // Remove all files under the volume.
 FileUtils.cleanDirectory(locFile);
 is it safe to clean the existing files?
 # public static void verifyIfValidPmemVolume(File pmemDir)
 Why this is public? package scope does not work?
 # FileMappableBlockLoader: currently this implementation is handling more than 
block loading. Ex: volume management. Probably you may want to consider having 
PmemVolumeManager class which can manage all this volumes and refification etc? 
I am ok if you want to take up this in other JIRA. Similar/same volume 
management may be needed in HDFS-14356 as well. Considering that, it worth 
moving to common place. What do you say?
 # FileMappableBlockLoader: Actually this class implementing the block loading 
for pmem. So, should this name say 
PmemFileMappableBlockLoader/PmemMappableBlockLoader? 
HDFS-14356 impl may name that implementation class name as 
NativePmemMappableBlockLoader (that will be pmdk based impl) ? Does this make 
sense ? cc [~rakeshr]
 # FileMappableBlockLoader : Currently you are reading data from input stream 
and verifying checksum and writing that buffer to MBB.
One thought here is: how about using FileChannel#transferTo API for 
transferring data from one channel to other natively. and then do mmap on 
destination file(assuming mmap may be faster in target file) and do checksum 
verification on it? 
I have not checked this here, just checking whether gives any further 
improvemeny as we are avoiding rading from HDD to user space.
 # FileMappedBlock: afterCache is basically getting block replica and setting 
the path. And this method used in FsDatasetCache. Why dont to expose required 
info from MappedBlock, that is key and filePath and call directly 
replica#setCachePath in FSDataSetCache?
 # ReplicaInfo replica = dataset.getBlockReplica(key.getBlockPoolId(),
     key.getBlockId());
     replica.setCachePath(filePath);
 # if (!result) {
 throw new IOException();
}
Please add message here, possible reason for failure etc.
 # Config Description: "Currently, there are two cache loaders implemented:"
I don't think you need to mention exact number of loaders (1/2/3), instead you 
can say Current available cacheLoaders are x, y. So you need not update this 
number when added new loader, just add that loader next to existing ones.
 # What happens if no space available on after volume filled with cached 
blocks? do you still choose that volume? do you need to update count and 
removed that volume for further writes?
 # Javadoc comment: "TODO: Refine location selection policy by considering 
storage utilization"
Add this TODO at getOneLocation API as thats the relavent place. I still dont 
think this valume management should be done by Loaders.
 # private void loadVolumes(String[] volumes): even one volume bad you would 
throw exception? or you have plans to skip that volume? I am ok if you would 
like to consider in next JIRAs. DN skips the bad volumes and proceed IIRC.
 # filePath = getOneLocation() + "/" + key.getBlockPoolId() + "-" + 
key.getBlockId();
This may be a problem when you cache more number of blocks? they all goes in 
same directory. DN creates subdirs in regular volumes.
is there any issue if we maintain the same path structure in pmem volume? that 
way you can also avoid storing that cached path replica?

Thanks [~PhiloHe] for the hard work you are doing on this.  

> Implement HDFS cache on SCM by using pure java mapped byte buffer
> -----------------------------------------------------------------
>
>                 Key: HDFS-14355
>                 URL: https://issues.apache.org/jira/browse/HDFS-14355
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: caching, datanode
>            Reporter: Feilong He
>            Assignee: Feilong He
>            Priority: Major
>         Attachments: HDFS-14355.000.patch, HDFS-14355.001.patch, 
> HDFS-14355.002.patch
>
>
> This task is to implement the caching to persistent memory using pure 
> {{java.nio.MappedByteBuffer}}, which could be useful in case native support 
> isn't available or convenient in some environments or platforms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14355) Implement HDFS cache on SCM by using pure java mapped byte buffer

Reply via email to