[
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250832#comment-14250832
]
Andrew Wang commented on HDFS-7430:
-----------------------------------
Cool, looks like you hit a lot of these. I did another review pass:
Nits:
* DFSConfigKeys, I agree the spacing is erratic in this files, but adding some
spaces to line up the variable names would agree with the variables immediately
around the new keys
* Still need javadoc {{<p/>}} tags in a lot of places. It's not a big deal, so
if you do another pass and think it looks fine we can leave it.
* TestFsDatasetImpl, FsVolumeImpl, FsDatasetSpi, FsDatasetImpl unused imports
* @VisibleForTesting could be added to
BlockScanner#Conf#INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD...
* Still some lines longer than 80 chars
Some more time conversions that could be done with TimeUnit:
* VolumeScanner#positiveMsToHours, the else case
* testScanRateImpl
FsDatasetImpl
* I'd still like to use JSON to save the iterator :) Pretty sure Jackson can
pretty print it for you.
* I also still like the iterator-of-iterators idea a lot, since we could
probably use the same iterator implementation at each level. Iterating would be
simpler, the serde would be harder, but overall I think simpler code and more
friendly for Java programmers.
* BlockIterator still implements Closeable, unnecessary?
VolumeScanner
{code}
// Find out how many bytes per second we should scan.
long neededBytesPerSec =
conf.targetBytesPerSec - (scannedBytesSum / MINUTES_PER_HOUR);
{code}
Still mismatched?
* Guessing the JDK7 file listing goodness is coming in the next patch, since
it's still using File#list
Tests:
* Did you look into the failed test I posted earlier? Any RCA?
* The bugs found in my previous review seem worth unit testing, e.g the OBO
with the binarySearch index and the neededBytesPerSec that still looks off, the
{{<=}} in place of {{<}} that affected continuous scans. Might be fun trying to
write some actual stripped down unit tests, rather than poking with a full
minicluster.
> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch,
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by
> keeping track of what block was scanned last, rather than by tracking the
> scan status of all blocks in memory. Also, instead of having just one
> thread, we should have a verification thread per hard disk (or other volume),
> scanning at a configurable rate of bytes per second.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)