[ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-7430:
---------------------------------------
    Attachment: HDFS-7430.001.patch

* added {{dfs.datanode.scan.period.hours}} to {{hdfs-default.xml}} (it wasn't 
there before)

* This patch adds a new configuration key, 
{{dfs.block.scanner.volume.bytes.per.second}}, which is the maximum number of 
bytes per second we should scan on each volume.  It defaults to 1MB/s per disk. 
 Previously, the maximum rate was a hard-coded 8MB/s for the DN as a whole 
(i.e. NOT per disk).

* Moved {{TestDatanodeBlockScanner#changeReplicaLength}} to 
{{DFSTestUtil#changeReplicaLength}}

* Instead of writing out a "verification log entry" for each replica it scans, 
the scanner now keeps track of a "cursor" which represents the last block to be 
scanned in the block pool slice on the volume.  (So a volume with 3 block pool 
slices may have 3 cursors)  The cursor is saved to a file every few minutes, if 
the cursor is changing.  The {{BlockIterator}} interface in {{FsVolumeSpi}} 
implements these cursors.

* Use one thread per disk.  This avoids situations where a slow or stuck disk 
can effectively stop the blockscanner from making any progress.  It also allows 
us to scale effectively (i.e. in high-density nodes with 20 drives).

* Added methods to get block iterators to {{FSDatasetSpi}}; removed 
{{RollingLogs}} methods from {{FSDatasetSpi}}.

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7430
>                 URL: https://issues.apache.org/jira/browse/HDFS-7430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>         Attachments: HDFS-7430.001.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to