[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Andrew Wang (JIRA) Wed, 17 Dec 2014 15:49:35 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250832#comment-14250832
 ]


Andrew Wang commented on HDFS-7430:
-----------------------------------

Cool, looks like you hit a lot of these. I did another review pass:

Nits: 
* DFSConfigKeys, I agree the spacing is erratic in this files, but adding some 
spaces to line up the variable names would agree with the variables immediately 
around the new keys
* Still need javadoc {{<p/>}} tags in a lot of places. It's not a big deal, so 
if you do another pass and think it looks fine we can leave it.
* TestFsDatasetImpl, FsVolumeImpl, FsDatasetSpi, FsDatasetImpl unused imports
* @VisibleForTesting could be added to 
BlockScanner#Conf#INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD...
* Still some lines longer than 80 chars

Some more time conversions that could be done with TimeUnit:
* VolumeScanner#positiveMsToHours, the else case
* testScanRateImpl

FsDatasetImpl
* I'd still like to use JSON to save the iterator :) Pretty sure Jackson can 
pretty print it for you.
* I also still like the iterator-of-iterators idea a lot, since we could 
probably use the same iterator implementation at each level. Iterating would be 
simpler, the serde would be harder, but overall I think simpler code and more 
friendly for Java programmers.
* BlockIterator still implements Closeable, unnecessary?

VolumeScanner

{code}
      // Find out how many bytes per second we should scan.
      long neededBytesPerSec =
                conf.targetBytesPerSec - (scannedBytesSum / MINUTES_PER_HOUR);
{code}

Still mismatched?

* Guessing the JDK7 file listing goodness is coming in the next patch, since 
it's still using File#list

Tests:
* Did you look into the failed test I posted earlier? Any RCA?
* The bugs found in my previous review seem worth unit testing, e.g the OBO 
with the binarySearch index and the neededBytesPerSec that still looks off, the 
{{<=}} in place of {{<}} that affected continuous scans. Might be fun trying to 
write some actual stripped down unit tests, rather than poking with a full 
minicluster.

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7430
>                 URL: https://issues.apache.org/jira/browse/HDFS-7430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Reply via email to