[
https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmytro Molkov updated HDFS-854:
-------------------------------
Attachment: HDFS-854.patch
Please have a look at the patch.
The problem we are trying to solve here is generating the first block report
quicker after restart by scanning the volumes in parallel. This way instead of
scanning 12 TB of data sequentially we scan 12 chunks of 1 TB in parallel.
Since there is a lot of latency in IO we have an improvement of a few times in
the time to generate the block report.
The test for this is just running the directory scanner test twice: with
parallel execution and without it.
> Datanode should scan devices in parallel to generate block report
> -----------------------------------------------------------------
>
> Key: HDFS-854
> URL: https://issues.apache.org/jira/browse/HDFS-854
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: dhruba borthakur
> Assignee: Dmytro Molkov
> Attachments: HDFS-854.patch
>
>
> A Datanode should scan its disk devices in parallel so that the time to
> generate a block report is reduced. This will reduce the startup time of a
> cluster.
> A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total
> of 150K blocks on these 12 disks. It takes the datanode upto 20 minutes to
> scan these devices to generate the first block report.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.