[jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them

Todd Lipcon (JIRA) Fri, 09 Nov 2012 17:01:31 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494480#comment-13494480
 ]


Todd Lipcon commented on HDFS-3429:
-----------------------------------

bq. The variable name might be a bit confusing where checksum reading depends 
on sendChecksum flag.

Not sure what you mean. We need to read the checksum off disk in either of two 
cases:
- verifyChecksum: we plan to verify it on the server side
- sendChecksum: we plan to send it to the client

These flags are used independently for various purposes, eg:
- Block scanner: the BlockSender is "sending" to a null sink, with the 
verifyChecksum flag set. This causes it to throw an error if the checksum 
doesn't match. 
- Normal read: the DataNode doesn't verify the checksum - instead, it just 
sends it to the client who verifies it
- Checksum-less read: neither verifies nor sends -- in this case, we don't want 
to read it off disk.

bq. Why was the length adjustment omitted in BlockSender ctor ?

It wasn't ommitted, just moved to a different part of the function.

bq. The above javadoc is inconsistent with the following code change:
Fixed

bq. If we don't need to read checksum, why would numChunks be 0 ?

If we're not reading checksums, then we don't need to "chunk" the data at all - 
we can send exactly as many bytes as are requested or fit into the packet. The 
concept of chunks is itself only relevant in the context of checksummed data. 
I'll add more commentary here.

bq. I am using the hadoop-0.20.2, so I want to fix the problem in 
hadoop-0.20.2, can you give me some advices about how to fix problem in 
hadoop-0.20.2?

I don't know if it's going to be possible to fix this for 0.20.2 without 
breaking wire compatibility. The patch you uploaded is likely not sufficient - 
have you tested it? Let's get this into trunk and branch-2 before worrying 
about an old maintenance branch.

bq. The failed test seems like it might be legit. I will look into it.

Indeed the failed test turned out to be because the upgrade test used files 
with checksums of length 60, which didn't divide evenly into the configured 
packet size. The new patch rounds down the packet size to align to a chunk 
boundary, which fixed the test.

                
> DataNode reads checksums even if client does not need them
> ----------------------------------------------------------
>
>                 Key: HDFS-3429
>                 URL: https://issues.apache.org/jira/browse/HDFS-3429
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node, performance
>    Affects Versions: 2.0.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-3429-0.20.2.patch, hdfs-3429.txt, hdfs-3429.txt
>
>
> Currently, even if the client does not want to verify checksums, the datanode 
> reads them anyway and sends them over the wire. This means that performance 
> improvements like HBase's application-level checksums don't have much benefit 
> when reading through the datanode, since the DN is still causing seeks into 
> the checksum file.
> (Credit goes to Dhruba for discovering this - filing on his behalf)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3429) DataNode reads checksums even if client does not need them

Reply via email to