[
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279501#comment-14279501
]
Andrew Wang commented on HDFS-7430:
-----------------------------------
Hey Colin, thanks for revving, I took another look. Just nits, I'm +1 pending
these:
The DFSConfigKeys spacing, we want the variable names to line up. Maybe we're
talking past each other here...this is the current:
{code}
public static final int DFS_DATANODE_SCAN_PERIOD_HOURS_DEFAULT = 0;
public static final String DFS_BLOCK_SCANNER_VOLUME_BYTES_PER_SECOND =
"dfs.block.scanner.volume.bytes.per.second";
public static final long DFS_BLOCK_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT =
1048576L;
public static final String DFS_DATANODE_TRANSFERTO_ALLOWED_KEY =
"dfs.datanode.transferTo.allowed";
{code}
Desired:
{code}
public static final int DFS_DATANODE_SCAN_PERIOD_HOURS_DEFAULT = 0;
public static final String DFS_BLOCK_SCANNER_VOLUME_BYTES_PER_SECOND =
"dfs.block.scanner.volume.bytes.per.second";
public static final long DFS_BLOCK_SCANNER_VOLUME_BYTES_PER_SECOND_DEFAULT
= 1048576L;
public static final String DFS_DATANODE_TRANSFERTO_ALLOWED_KEY =
"dfs.datanode.transferTo.allowed";
{code}
FSVolumeImpl
* Unused imports
* Would be good to wrap the LOG.trace in save and load with isTraceEnabled,
which will let us skip doing the writeValueAsString. slf4j just lets us skip
string construction costs, the parameters are still resolved.
* I guess we can not do the iterator-of-iterator, but could we still rename
{{eof}} to something more descriptive? Don't need to stick to error code names
here.
* Regarding the iterator extending Closeable, can't we just add Closeable when
there's a need for it? This is a private interface after all.
TestOverReplicatedBlocks, opportunity to fix the spacing in this line:
{code}
for(int i=0; !scanCursor.delete(); i++) {
{code}
BlockScanner:
* INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD_BYTES_PER_SECOND_DEFAULT is never used?
IIRC this was a "min" to avoid dribbly little scans, but I'm ok with not having
that.
TestBlockScanner:
* testVolumeIteratorImpl is long and the flow control is quite complicated, is
it possible to simplify / break this up? Seems worth it even at the cost of
some duplicated code.
* A bunch of the log statements prepend "TestBlockScanner:" but isn't this
unnecessary? Our log4j format includes the log's name, which is
TestBlockScanner.
> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch,
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch,
> HDFS-7430.007.patch, HDFS-7430.008.patch, HDFS-7430.009.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by
> keeping track of what block was scanned last, rather than by tracking the
> scan status of all blocks in memory. Also, instead of having just one
> thread, we should have a verification thread per hard disk (or other volume),
> scanning at a configurable rate of bytes per second.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)