[
https://issues.apache.org/jira/browse/HDFS-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872890#comment-13872890
]
Haohui Mai commented on HDFS-5783:
----------------------------------
A preliminary experiment shows that there is little impact of loading time. I
load a 512M fsimage on my laptop, here is the number:
# Loading the FSImage, computing the digest with {{DigestInputStream}}: 9920ms
# Loading the FSImage, without computing the digest: 7467ms
# Calculating MD5 independently: 1231ms
The reason why (2) + (3) is slightly faster than (1) is because currently we
cannot consume all I/O bandwidth when loading fsimage.
> Compute the digest before loading FSImage
> -----------------------------------------
>
> Key: HDFS-5783
> URL: https://issues.apache.org/jira/browse/HDFS-5783
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: HDFS-5698 (FSImage in protobuf)
> Reporter: Haohui Mai
> Assignee: Haohui Mai
>
> When loading the fsimage, the current code computes its MD5 digest
> on-the-fly. It does not work when the code does not read all the sections in
> strictly sequential order.
> This jira proposes to compute the MD5 digest before loading fsimage.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)