[
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240729#comment-14240729
]
Vinayakumar B commented on HDFS-7430:
-------------------------------------
Haven't gone through full. But checked the iterator.
Got the idea of getting the next block to scan by scanning through the
subdirectories and choose the next block to scan.
Currently subdir path is based on blockId directly, its not confirmed that
sorting orders of the subdirs is not same as creation of blocks.
I have few Qs here.
* how about blocks which goes to already scanned subdirs ?
These subdirs have to wait for their next turn for scanning?
* If there are very few blocks ( just for example) and completed scaning fast,
then whether the scanning will continue again?
* How new blocks are identified among already scanned blocks in the same subdir
as there is no track of previous scan information maintained?
> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch,
> HDFS-7430.004.patch, HDFS-7430.005.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by
> keeping track of what block was scanned last, rather than by tracking the
> scan status of all blocks in memory. Also, instead of having just one
> thread, we should have a verification thread per hard disk (or other volume),
> scanning at a configurable rate of bytes per second.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)