[ 
https://issues.apache.org/jira/browse/HDFS-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703745#comment-13703745
 ] 

Colin Patrick McCabe commented on HDFS-4960:
--------------------------------------------

Two proposals:
* check the version on the DataNode side rather than the client side.  Then, we 
don't have to re-check the header of checksum files that are already in 
FileInputStreamCache.
* Add an optional boolean to OpRequestShortCircuitAccessProto that can be used 
to request *only* the block file, not the checksum file.  This will avoid the 
overhead of duplicating the other file descriptor.  In our testing, this 
overhead was much higher than just doing a read.

Also some reminders:
* if you're optimizing for SSDs (as it says in the description), seeks don't 
matter!
* the .meta file will likely be in the cache after a few reads, so seeks won't 
happen anyway
* let's not commit anything without at least one test that shows it's better.
                
> Unnecessary .meta seeks even when skip checksum is true
> -------------------------------------------------------
>
>                 Key: HDFS-4960
>                 URL: https://issues.apache.org/jira/browse/HDFS-4960
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Varun Sharma
>            Assignee: Varun Sharma
>         Attachments: 4960-branch2.patch, 4960-trunk.patch
>
>
> While attempting to benchmark an HBase + Hadoop 2.0 setup on SSDs, we found 
> unnecessary seeks into .meta files, each seek was a 7 byte read at the head 
> of the file - this attempts to validate the version #. Since the client is 
> requesting no-checksum, we should not be needing to touch the .meta file at 
> all.
> Since the purpose of skip checksum is to also avoid the performance penalty 
> of the extra seek, we should not be seeking into .meta if skip checksum is 
> true

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to