[
https://issues.apache.org/jira/browse/HDFS-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13532161#comment-13532161
]
liang xie commented on HDFS-3429:
---------------------------------
still no obvious difference be found at another 100%read scenario withou
IO-bound
i did "strace -p <DN pid> -f -tt -T -e trace=file -o bbb" during a several
minutes run(without patch),then:
grep "current/finalized" bbb|wc -l
16905
grep meta bbb|wc -l
9858
grep meta bbb|grep open|wc -l
3286
grep meta bbb|grep stat|wc -l
6572
grep meta bbb|grep "\".*\"" -o|sort -n |uniq -c|wc -l
303
And most of those meta files size are several hundred of kilobytes, further
more, our OS has a default read_ahead_kb: 128
so the benefit was not obvious seems make sense as well. Any idea, [~tlipcon] ?
But i am +1 for this patch, due to it can reduce some unnecessary IO & system
call
> DataNode reads checksums even if client does not need them
> ----------------------------------------------------------
>
> Key: HDFS-3429
> URL: https://issues.apache.org/jira/browse/HDFS-3429
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, performance
> Affects Versions: 2.0.0-alpha
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hdfs-3429-0.20.2.patch, hdfs-3429-0.20.2.patch,
> hdfs-3429.txt, hdfs-3429.txt, hdfs-3429.txt
>
>
> Currently, even if the client does not want to verify checksums, the datanode
> reads them anyway and sends them over the wire. This means that performance
> improvements like HBase's application-level checksums don't have much benefit
> when reading through the datanode, since the DN is still causing seeks into
> the checksum file.
> (Credit goes to Dhruba for discovering this - filing on his behalf)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira