[
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062322#comment-14062322
]
Colin Patrick McCabe commented on HDFS-6114:
--------------------------------------------
bq. blockInfoSet is required to be sorted based on the lastScanTime, as oldest
scanned block will be picked for scanning, which will be the first element in
this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because
BlockScanInfo#hashCode() is default which will sort based on the blockId rather
than scan time. Do you suggest me to update this hashCode() itself?
I was suggesting that you use a {{TreeSet}} or {{TreeMap}} with the same
comparator as {{blockInfoSet}}. All the hash sets that I'm aware of do not
shrink down after enlarging.
bq. So delBlockInfo and delNewBlockInfo serves separate purposes and both are
required.
I can write a version of the patch that only has one del function and only one
add function. I am really reluctant to put in another set of add/del functions
on top of what's already there, since I think it will make things hard to
understand for people trying to modify this code later or backport this patch
to other branches.
> Block Scan log rolling will never happen if blocks written continuously
> leading to huge size of dncp_block_verification.log.curr
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-6114
> URL: https://issues.apache.org/jira/browse/HDFS-6114
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.3.0, 2.4.0
> Reporter: Vinayakumar B
> Assignee: Vinayakumar B
> Priority: Critical
> Attachments: HDFS-6114.patch, HDFS-6114.patch
>
>
> 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are
> scanned.
> 2. If the blocks (with size in several MBs) to datanode are written
> continuously
> then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously
> scanning the blocks
> 3. These blocks will be deleted after some time (enough to get block scanned)
> 4. As Block Scanning is throttled, So verification of all blocks will take so
> much time.
> 5. Rolling will never happen, so even though the total number of blocks in
> datanode doesn't increases, entries ( which contains stale entries of deleted
> blocks) in *dncp_block_verification.log.curr* continuously increases leading
> to huge size.
> In one of our env, it grown more than 1TB where total number of blocks were
> only ~45k.
--
This message was sent by Atlassian JIRA
(v6.2#6252)