[
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243146#comment-14243146
]
Andrew Wang commented on HDFS-7430:
-----------------------------------
This is a heroic and much needed patch. Thanks for taking this on Colin, mondo
diff here:
{noformat}
32 files changed, 2268 insertions(+), 2414 deletions(-)
{noformat}
I appreciate that a lot of the new lines are dedicated to code comments,
configuration, and tests, and it's still a net reduction in the number of lines
of code. Well done just based on that.
Anyway, as with any patch this large, I do have a few review comments :)
Misc:
* Any reason BlockScanner is contained in DataNode rather than FsDatasetImpl?
It seems like their lifecycles are the same. Might also move to the fsdataset
package if you agree.
* There's a bunch of time conversion scattered about, it'd be better to use
{{TimeUnit.MILLISECONDS.toHours(millis)}} and similar where we can. I like this
form better than {{TimeUnit#convert}} since it's very obvious.
* Javadoc, need to put a {{<p/>}} tag to actually get line-breaks.
* A few lines longer than 80 chars
BlockScanner:
* Can add a log when removeVolumeScanner is called and not enabled
* Could we have a comment about the lock hierarchy? It seems like it's
FSDatasetImpl -> BlockScanner -> VolumeScanner, and some things go direct to
BlockScanner, like initBlockPool and the servlet.
* getVolumeStats and VolumeScanner#getStatistics can be @VisibleForTesting
BlockScanner#Conf:
* IIUC the getUnitTestLong is for the INTERNAL_ config keys, but it's not being
used for MAX_STALENESS_MS. This one is also seemingly not set by any unit
tests, so maybe can be removed.
* Would prefer Precondition checks that abort with a message rather than
Math.max'ing with 0. Users should know when the have an invalid config, maybe
they typoed a minus sign rather than wanting a value of zero. This isn't
necessary for the INTERNAL_ ones though, since they're only set by tests.
VolumeScanner:
* Again a comment about locking would be nice. I see direct usage of FSVolumes
which I think is safe, but dunno really. Various races that lead to FNF too,
which are already described in ScanResultHandler.
* ScanResultHandler can be package-private
* getStatistics and getVolume can be @VisibleForTesting
* I notice some Time.now() being used to calculate an interval in
findNextUsableBlockIter, can this be monotonicNow instead?
* scanBlock, the first series of try/catch, is this going to lead to lots of
log prints? I believe passing in the exception will print a stack trace. I
noticed ScanResultHandler quashes a lot of these.
{code}
long neededBytesPerSec =
conf.targetBytesPerSec - (scannedBytesSum / MINUTES_PER_HOUR);
{code}
Seems like a mismatch here, since this is subtracting the average num bytes
scanned per minute, not per second
{code}
if (neededBytesPerSec <= conf.thresholdBytesPerSec) {
{code}
This should be just {{<}} based on the following log message.
{code}
// Get the genstamp for the block.
{code}
Comment seems unrelated.
Higher-level FsVolumeSpi/FsVolumeImpl BlockIterator stuff:
* Doesn't BlockIterator fit better in BlockPoolSlice, since it's tied to a
block pool?
* Implementing {{Iterator}} and {{Iterable}} might make a lot of sense here.
For instance, {{hasNext}} is more canonical than {{eof}}.
* Also not sure why we bother implementing Closeable, when there's just an
empty {{close()}} implementation.
* Rather than hand-rolled serde, could we use Jackson to do JSON? I think it'll
save some lines of code too.
* Finally, have you considered implementing this as an iterator-of-iterators?
The complexity in this code comes from advancing curDir and curSubDir as
appropriate, but we could encapsulate the logic for each in its own iterator,
e.g. BlockIterator -> SubdirIterator -> SubdirIterator -> SubdirBlockIterator.
Serializing would be saving the current iter position of each. Right now this
is coded in a very C-style with checking nulls.
Detail/nitty BlockIterator stuff:
* Class javadoc for BlockIterator is a bit incorrect, it returns entries from a
block pool, not the entire volume.
* Unused imports
* getSubdirEntries, we can directly set {{long now = Time.monotonicNow}} and
save the else case.
* It'd be nicer to use {{Paths.get}} rather than stringing together all these
File objects. Yay JDK7!
* getNextSubDir, getNextFinalizedDir, getNextFinalizedSubDir, getSubdirEntries
don't need to throw IOException, neither does nextBlock. I'm a bit surprised,
but it turns out listing dirs only throws a RuntimeException on permission
error (which we shouldn't have to worry about). Can clean the resulting
try/catch in VolumeScanner too.
* Is it worth doing rename tricks to atomically save the cursor file? I
remember reading an LWN article about this being surprisingly hard to do
correctly, so maybe worth looking that up too. Could also just double buffer
and look at ctimes too.
* The cache timeout seems rather low. With the default params of 1MB/s and 30s
timeout, we'll timeout each time if blocks are bigger than 30MB.
{code}
if (res < 0) {
res = -res;
}
{code}
Is this OBO? If missing, the return value is {{-index - 1}}. With the code as
it is, we'll never return 0, which is what we want if the first item in the
list was deleted.
DFSConfigKeys:
* Spacing is a little off with rest of file
Tests:
* We axed a bunch of grotty tests. Nice.
* TestFsDatasetImpl, SnapshotTestHelper, unused imports
* I got the following error when running TestBlockScanner:
{noformat}
java.lang.AssertionError: unexpected value of nextBlockPoolScanStartMs -1
(current time is 101474338). stats = Statistics{bytesScannedInPastHour=80,
blocksScannedInCurrentPeriod=0, blocksScannedSinceRestart=16,
scansSinceRestart=3, scanErrorsSinceRestart=0, nextBlockPoolScanStartMs=-1,
blockPoolPeriodEndsMs=1418691762027,
lastBlockScanned=BP-627357344-127.0.1.1-1418331758488:blk_1073741840_0,
eof=true}
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at
org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testMultipleBlockPoolScanning(TestBlockScanner.java:665)
{noformat}
> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch,
> HDFS-7430.004.patch, HDFS-7430.005.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by
> keeping track of what block was scanned last, rather than by tracking the
> scan status of all blocks in memory. Also, instead of having just one
> thread, we should have a verification thread per hard disk (or other volume),
> scanning at a configurable rate of bytes per second.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)