[
https://issues.apache.org/jira/browse/HDFS-7430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14254373#comment-14254373
]
Colin Patrick McCabe commented on HDFS-7430:
--------------------------------------------
I rebased on trunk.
I moved the call to {{BlockScanner#addVolumeScanner}} into {{FsVolumeList}}.
Since the call to {{BlockScanner#removeVolumeScanner}} was there, it seems to
make more sense this way.
bq. DFSConfigKeys, I agree the spacing is erratic in this files, but adding
some spaces to line up the variable names would agree with the variables
immediately around the new keys
If the spacing is erratic, let's fix the spacing. I filed HDFS-7557 to do
this. It will be nicer to do it in a separate jira to avoid muddying the
change history.
bq. Still need javadoc <p/> tags in a lot of places. It's not a big deal, so if
you do another pass and think it looks fine we can leave it.
I did another pass on this and found a few spots I missed the first time. I
agree that we can do a follow-on if there are any more... it's not critical,
just kind of nice.
bq. TestFsDatasetImpl, FsVolumeImpl, FsDatasetSpi, FsDatasetImpl unused imports
OK, I just removed a few and all of them show up as black in Intellij.
Hopefully I got them all this time
bq. \@VisibleForTesting could be added to
BlockScanner#Conf#INTERNAL_DFS_BLOCK_SCANNER_THRESHOLD...
added
bq. Still some lines longer than 80 chars
-> look at
bq. VolumeScanner#positiveMsToHours, the else case \[could be done with
TimeUnit\]
Changed. Also fixed some cases where {{TimeUnit#convert}} was being used
improperly by me (src and dst units were flipped)
bq. testScanRateImpl \[could be done with TimeUnit\]
This is probably a dumb question, but I'm not sure what you are suggesting I
should use {{TimeUnit}} for here?
bq. I'd still like to use JSON to save the iterator Pretty sure Jackson can
pretty print it for you.
OK. I'll use JSON here.
bq. I also still like the iterator-of-iterators idea a lot, since we could
probably use the same iterator implementation at each level. Iterating would be
simpler, the serde would be harder, but overall I think simpler code and more
friendly for Java programmers.
We'll have time to look into this later. Rebasing a big patch like this is
annoying, and it's debatable whether iterator-of-iterators is even better (it's
certainly less efficient)
bq. BlockIterator still implements Closeable, unnecessary?
It might be necessary for other implementations of BlockIterator, besides the
one in {{FsDatasetImpl}}. In general, you might need to create some
storage-system-level iterator.
bq. \[neededBytesPerSec\] Still mismatched?
Very good catch. Yeah, it needs to convert between bytes/hour and bytes/sec
bq. Guessing the JDK7 file listing goodness is coming in the next patch, since
it's still using File#list
Now that I rebased on trunk, I can add this. Added.
bq. Did you look into the failed test I posted earlier? Any RCA?
I think that this probably resulted from the incorrect rate computation.
bq. The bugs found in my previous review seem worth unit testing, e.g the OBO
with the binarySearch index and the neededBytesPerSec that still looks off, the
<= in place of < that affected continuous scans. Might be fun trying to write
some actual stripped down unit tests, rather than poking with a full
minicluster.
OK that's fair. The binarySearch index code path didn't get much exercise,
because the off-by-one appeared only when the directory listing had changed
between our last listDir and the current one. I wrote a unit test just for
{{FsVolumeImpl#nextSorted}} that should verify that this is working as expected.
I will also break {{calculateNeededBytesPerSec}} out into a function and write
a unit test for that
> Refactor the BlockScanner to use O(1) memory and use multiple threads
> ---------------------------------------------------------------------
>
> Key: HDFS-7430
> URL: https://issues.apache.org/jira/browse/HDFS-7430
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7430.002.patch, HDFS-7430.003.patch,
> HDFS-7430.004.patch, HDFS-7430.005.patch, HDFS-7430.006.patch, memory.png
>
>
> We should update the BlockScanner to use a constant amount of memory by
> keeping track of what block was scanned last, rather than by tracking the
> scan status of all blocks in memory. Also, instead of having just one
> thread, we should have a verification thread per hard disk (or other volume),
> scanning at a configurable rate of bytes per second.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)