[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Colin Patrick McCabe (JIRA) Tue, 16 Dec 2014 15:04:31 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249132#comment-14249132
 ]


Colin Patrick McCabe commented on HDFS-7430:
--------------------------------------------

bq. getVolumeStats and VolumeScanner#getStatistics can be @VisibleForTesting

ok

bq. IIUC the getUnitTestLong is for the INTERNAL_ config keys, but it's not 
being used for MAX_STALENESS_MS. This one is also seemingly not set by any unit 
tests, so maybe can be removed.

Yeah, we should use {{getUnitTestLong}} here.  Added.  I would prefer to keep 
this here in case we ever wanted to expose it, or if we added a test for it 
later.  The unit tests does test different staleness settings, but using the 
setter directly at the moment.

bq. Would prefer Precondition checks that abort with a message rather than 
Math.max'ing with 0. Users should know when the have an invalid config, maybe 
they typoed a minus sign rather than wanting a value of zero. This isn't 
necessary for the INTERNAL_ ones though, since they're only set by tests.

So, we already have a convention going on that setting 
{{dfs.datanode.scan.period.hours}} to -1 indicates that the block scanner 
should be disabled.  It's not my favorite convention, but I don't want to break 
user configs over a bikeshed issue.  I would also prefer to handle all config 
keys uniformly, rather than throwing an exception if some were negative, but 
letting others go negative.  I think the likelihood of accidentally setting a 
negative value is pretty small... I would expect most mistakes to be leaving 
out a zero, or just not setting the key at all, etc.

bq. Again a comment about locking would be nice. I see direct usage of 
FSVolumes which I think is safe, but dunno really. 

[~eddyxu] and I have been discussing this in HDFS-7496.  Once that JIRA is 
complete, FsVolume instances will be reference-counted, ensuring that we never 
perform an operation on a volume that has already been removed.  That fits in 
nicely with the current code, because the volume removal code instructs the 
volume scanner to shut down.

bq. Various races that lead to FNF too, which are already described in 
ScanResultHandler.

Yeah, these already exist in the current block scanner code.  Eventually maybe 
we can do a spin-wait loop to make sure that we can distinguish real block FNF 
from race-condition-induced FNF.  But I think that's something we should do in 
a follow-up... this patch is big enough :)

bq. ScanResultHandler can be package-private

ok

bq. getStatistics and getVolume can be @VisibleForTesting

{{getStatistics}}, sure.  {{getVolume}} is called by {{BlockScanner}}

bq. I notice some Time.now() being used to calculate an interval in 
findNextUsableBlockIter, can this be monotonicNow instead?

The iterator start time is saved in the cursor file, and it is a wall-clock 
time, not a monotonic time.  This is necessary because if we shut down the 
datanode and reboot, we want to be able to pick up in our scan where we left 
off.  But monotonic time resets every time we reboot.  I added a comment in 
{{findNextUsableBlockIter}} to clarify this.

bq. scanBlock, the first series of try/catch, is this going to lead to lots of 
log prints? I believe passing in the exception will print a stack trace. I 
noticed ScanResultHandler quashes a lot of these.

Yeah, let's not print the stack trace... it's not interesting here.  Also, I 
will log a message when {{FsVolumeSpi#getStoredBlock}} returns null instead of 
throwing {{FileNotFoundException}} (apparently it's allowed to do that too :P )

bq. Seems like a mismatch here, since this is subtracting the average num bytes 
scanned per minute, not per second

Yikes.  Yeah, we need the average bytes per *second* here, not the average 
bytes per *minute*, which is what we'd get from just dividing the bytes in the 
last hour by 60.  Fixed.

bq. if (neededBytesPerSec <= conf.thresholdBytesPerSec) { ... should be just < 

Yeah, let's use < instead.  This will also allow setting the threshold to 0 to 
get continuous scan.

bq. Comment seems unrelated.

So, actually the comment is related... we're taking one {{ExtendedBlock}} 
structure, which always has 0 for its genstamp, and looking up another 
{{ExtendedBlock}} structure, which will have a valid genstamp.  It's 
frustrating that we don't have a more expressive type system that could express 
something like "this struct has poolId and blockId, but not genstamp."  Instead 
we've got Block.java which has id / genstamp, and ExtendedBlock.java, which has 
all that plus block pool id.

I will see if I can come up with a better comment in the code.

bq. Doesn't BlockIterator fit better in BlockPoolSlice, since it's tied to a 
block pool?

{{BlockPoolSlice}} is internal to {{FsDatasetImpl}}.  It's not a generic class, 
whereas the volume scanner is intended to be able to work on any volume, not 
just ones from {{FsVolumeImpl}}.

bq. Implementing Iterator and Iterable might make a lot of sense here. For 
instance, hasNext is more canonical than eof.

I did consider that, but I don't think it's really useful.  It's not like you'd 
want to put this in a "for" loop or something.  {{Iterator}} is kind of 
annoying because it has a {{remove}} method (which I certainly don't want to 
support here), and because {{hasNext}} is supposed to return something valid 
even if you haven't called {{next}} yet.  This is annoying.  Another annoying 
thing about {{hasNext}} is that someone could call {{hasNext}}, wait a long 
time, and then call {{next}}.  Then I'd have to return "something" even if all 
blocks had been removed from the volume.  It's a contract I don't really want 
to bind myself to here.

bq. Rather than hand-rolled serde, could we use Jackson to do JSON? I think 
it'll save some lines of code too.

Hmm.  Let me think about that.  I think it's pretty readable as-is, both in the 
code and in the saved file.  The serialization code is VERY short right now and 
I like having a greppable file with one key per line.  If I saved as JSON, I'd 
want to at least pretty-print it so that we had no more than one key per line 
as well.  And there would be some complexity around ensuring that only 
appropriate values got saved.  I don't think the current format is really that 
problematic, and if we want to add a field later that is a list or map, we can 
always make just that line contain JSON.  Another nice thing about the current 
format is that it fits in with how we do {{VERSION}}.

bq. Finally, have you considered implementing this as an iterator-of-iterators? 
The complexity in this code comes from advancing curDir and curSubDir as 
appropriate, but we could encapsulate the logic for each in its own iterator, 
e.g. BlockIterator -> SubdirIterator -> SubdirIterator -> SubdirBlockIterator. 
Serializing would be saving the current iter position of each. Right now this 
is coded in a very C-style with checking nulls.

My gut feeling is that splitting the code up would just make it longer and 
probably not any less repetitious...  Instead of checking whether 
{{getNextCurFinalizedSubDir}} returned {{null}}, you'd check if 
{{finalizedSubDirIter#next}} returned {{null}}.  And all of the "encapsulated" 
iterators would need references to the "encapsulating" iterators, in order to 
build a path.  I think it would get kind of messy.

bq. Class javadoc for BlockIterator is a bit incorrect, it returns entries from 
a block pool, not the entire volume.

Fixed

bq. Unused imports

Removed

one last set of comments to address...

> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
>                 Key: HDFS-7430
>                 URL: https://issues.apache.org/jira/browse/HDFS-7430
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch, 
> HDFS-7430.004.patch, HDFS-7430.005.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by 
> keeping track of what block was scanned last, rather than by tracking the 
> scan status of all blocks in memory.  Also, instead of having just one 
> thread, we should have a verification thread per hard disk (or other volume), 
> scanning at a configurable rate of bytes per second.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7430) Refactor the BlockScanner to use O(1) memory and use multiple threads

Reply via email to