[ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279794#comment-14279794
 ] 

Colin Patrick McCabe commented on HDFS-7430:
--------------------------------------------

bq. ... DFSConfigKeys spacing...

Ah, that's what you meant.  Fixed.

* Unused imports \[in FsVolumeImpl\]

Fixed

bq. Would be good to wrap the LOG.trace in save and load with isTraceEnabled, 
which will let us skip doing the writeValueAsString. slf4j just lets us skip 
string construction costs, the parameters are still resolved.

ok

bq. could we still rename eof to something more descriptive? Don't need to 
stick to error code names here.

renamed to 'atEnd'

bq. Regarding the iterator extending Closeable, can't we just add Closeable 
when there's a need for it? This is a private interface after all.

This is going to be needed once we start reference-counting FsVolume objects.  
The close method will decrement a reference count on the volume.  Lei Xu is 
working on that change now.

bq. TestOverReplicatedBlocks, opportunity to fix the spacing in this line:

ok

bq. INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD_BYTES_PER_SECOND_DEFAULT is never 
used? IIRC this was a "min" to avoid dribbly little scans, but I'm ok with not 
having that.

yeah, it's unused. removed.

bq. testVolumeIteratorImpl is long and the flow control is quite complicated, 
is it possible to simplify / break this up? Seems worth it even at the cost of 
some duplicated code.

So, the issue with this test is that it's testing a bunch of stuff, including 
rewind, load, and save.  I think there is some value in testing the combination 
of all these things, really stressing it.  It might look a little cleaner if 
the "if blocksprocessed..." statements were factored out into some kind of 
observer objects, but on the other hand, it would probably be even harder to 
follow the control flow.  I'd say let's refactor this test in a follow-on if 
needed.

bq. A bunch of the log statements prepend "TestBlockScanner:" but isn't this 
unnecessary? Our log4j format includes the log's name, which is 
TestBlockScanner.

yeah, I'll skip logging that.

thanks for the +1, will commit when jenkins returns.  Can file any more work as 
follow ons

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7430
>                 URL: https://issues.apache.org/jira/browse/HDFS-7430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, 
> HDFS-7430.007.patch, HDFS-7430.008.patch, HDFS-7430.009.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to