[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Andrew Wang (JIRA) Thu, 11 Dec 2014 13:11:05 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243146#comment-14243146
 ]


Andrew Wang commented on HDFS-7430:
-----------------------------------

This is a heroic and much needed patch. Thanks for taking this on Colin, mondo 
diff here:

{noformat}
 32 files changed, 2268 insertions(+), 2414 deletions(-)
{noformat}

I appreciate that a lot of the new lines are dedicated to code comments, 
configuration, and tests, and it's still a net reduction in the number of lines 
of code. Well done just based on that.

Anyway, as with any patch this large, I do have a few review comments :)

Misc:
* Any reason BlockScanner is contained in DataNode rather than FsDatasetImpl? 
It seems like their lifecycles are the same. Might also move to the fsdataset 
package if you agree.
* There's a bunch of time conversion scattered about, it'd be better to use 
{{TimeUnit.MILLISECONDS.toHours(millis)}} and similar where we can. I like this 
form better than {{TimeUnit#convert}} since it's very obvious.
* Javadoc, need to put a {{<p/>}} tag to actually get line-breaks.
* A few lines longer than 80 chars

BlockScanner:
* Can add a log when removeVolumeScanner is called and not enabled
* Could we have a comment about the lock hierarchy? It seems like it's 
FSDatasetImpl -> BlockScanner -> VolumeScanner, and some things go direct to 
BlockScanner, like initBlockPool and the servlet.
* getVolumeStats and VolumeScanner#getStatistics can be @VisibleForTesting

BlockScanner#Conf:
* IIUC the getUnitTestLong is for the INTERNAL_ config keys, but it's not being 
used for MAX_STALENESS_MS. This one is also seemingly not set by any unit 
tests, so maybe can be removed.
* Would prefer Precondition checks that abort with a message rather than 
Math.max'ing with 0. Users should know when the have an invalid config, maybe 
they typoed a minus sign rather than wanting a value of zero. This isn't 
necessary for the INTERNAL_ ones though, since they're only set by tests.

VolumeScanner:
* Again a comment about locking would be nice. I see direct usage of FSVolumes 
which I think is safe, but dunno really. Various races that lead to FNF too, 
which are already described in ScanResultHandler.
* ScanResultHandler can be package-private
* getStatistics and getVolume can be @VisibleForTesting
* I notice some Time.now() being used to calculate an interval in 
findNextUsableBlockIter, can this be monotonicNow instead?
* scanBlock, the first series of try/catch, is this going to lead to lots of 
log prints? I believe passing in the exception will print a stack trace. I 
noticed ScanResultHandler quashes a lot of these.

{code}
long neededBytesPerSec =
          conf.targetBytesPerSec - (scannedBytesSum / MINUTES_PER_HOUR);
{code}

Seems like a mismatch here, since this is subtracting the average num bytes 
scanned per minute, not per second

{code}
if (neededBytesPerSec <= conf.thresholdBytesPerSec) {
{code}

This should be just {{<}} based on the following log message.

{code}
    // Get the genstamp for the block.
{code}

Comment seems unrelated.

Higher-level FsVolumeSpi/FsVolumeImpl BlockIterator stuff:
* Doesn't BlockIterator fit better in BlockPoolSlice, since it's tied to a 
block pool?
* Implementing {{Iterator}} and {{Iterable}} might make a lot of sense here. 
For instance, {{hasNext}} is more canonical than {{eof}}.
* Also not sure why we bother implementing Closeable, when there's just an 
empty {{close()}} implementation.
* Rather than hand-rolled serde, could we use Jackson to do JSON? I think it'll 
save some lines of code too.
* Finally, have you considered implementing this as an iterator-of-iterators? 
The complexity in this code comes from advancing curDir and curSubDir as 
appropriate, but we could encapsulate the logic for each in its own iterator, 
e.g. BlockIterator -> SubdirIterator -> SubdirIterator -> SubdirBlockIterator. 
Serializing would be saving the current iter position of each. Right now this 
is coded in a very C-style with checking nulls.

Detail/nitty BlockIterator stuff:
* Class javadoc for BlockIterator is a bit incorrect, it returns entries from a 
block pool, not the entire volume.
* Unused imports
* getSubdirEntries, we can directly set {{long now = Time.monotonicNow}} and 
save the else case.
* It'd be nicer to use {{Paths.get}} rather than stringing together all these 
File objects. Yay JDK7!
* getNextSubDir, getNextFinalizedDir, getNextFinalizedSubDir, getSubdirEntries 
don't need to throw IOException, neither does nextBlock. I'm a bit surprised, 
but it turns out listing dirs only throws a RuntimeException on permission 
error (which we shouldn't have to worry about). Can clean the resulting 
try/catch in VolumeScanner too.
* Is it worth doing rename tricks to atomically save the cursor file? I 
remember reading an LWN article about this being surprisingly hard to do 
correctly, so maybe worth looking that up too. Could also just double buffer 
and look at ctimes too.
* The cache timeout seems rather low. With the default params of 1MB/s and 30s 
timeout, we'll timeout each time if blocks are bigger than 30MB.

{code}
if (res < 0) {
  res = -res;
}
{code}

Is this OBO? If missing, the return value is {{-index - 1}}. With the code as 
it is, we'll never return 0, which is what we want if the first item in the 
list was deleted.

DFSConfigKeys:
* Spacing is a little off with rest of file

Tests:
* We axed a bunch of grotty tests. Nice.
* TestFsDatasetImpl, SnapshotTestHelper, unused imports
* I got the following error when running TestBlockScanner:

{noformat}
java.lang.AssertionError: unexpected value of nextBlockPoolScanStartMs -1 
(current time is 101474338). stats = Statistics{bytesScannedInPastHour=80, 
blocksScannedInCurrentPeriod=0, blocksScannedSinceRestart=16, 
scansSinceRestart=3, scanErrorsSinceRestart=0, nextBlockPoolScanStartMs=-1, 
blockPoolPeriodEndsMs=1418691762027, 
lastBlockScanned=BP-627357344-127.0.1.1-1418331758488:blk_1073741840_0, 
eof=true}
    at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at 
org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testMultipleBlockPoolScanning(TestBlockScanner.java:665)
{noformat}

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7430
>                 URL: https://issues.apache.org/jira/browse/HDFS-7430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Reply via email to