[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Colin Patrick McCabe (JIRA) Wed, 10 Dec 2014 15:16:41 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241924#comment-14241924
 ]


Colin Patrick McCabe commented on HDFS-7430:
--------------------------------------------

bq. how about blocks which goes to already scanned subdirs ?  These subdirs 
have to wait for their next turn for scanning?

Yes.  They will have to wait until the next scan period (normally a week or 
two.)

In general, block scanning is always going to be a probabilistic thing.  We 
never know when blocks are going to go bad.  Perhaps a block goes bad right 
after we scan it, and then we don't find out until two weeks later when we 
rescan.  This could have happened under the old block scanner.  The function of 
the block scanner isn't to immediately detect all blocks going bad (this is 
simply not possible), but to ensure that we look at old data every once in a 
while.

bq. If there are very few blocks ( just for example) and completed scaning 
fast, then whether the scanning will continue again?

We will restart scanning after {{dfs.datanode.scan.period.hours}} have gone by.

If we can't complete a full scan within {{dfs.datanode.scan.period.hours}}, 
then we will scan continuously without a break.

bq. How new blocks are identified among already scanned blocks in the same 
subdir as there is no track of previous scan information maintained?

There is no distinction between "old" and "new" blocks.  There is just an 
iterator and your position.  Your position is saved to the cursor file 
periodically, so that if the datanode is restarted, we pick up the scan more or 
less where we left off.

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7430
>                 URL: https://issues.apache.org/jira/browse/HDFS-7430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Reply via email to