[
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14061263#comment-14061263
]
Colin Patrick McCabe commented on HDFS-6114:
--------------------------------------------
I don't really see a good reason to separate {{delBlockInfo}} and
{{delNewBlockInfo}}. It seems like this could just lead to scenarios where we
think we're deleting a block but it pops back up (because we deleted, but did
not delete new)
I guess maybe it makes sense to separate {{addBlockInfo}} from
{{addNewBlockInfo}}, just because there are places in the setup code where
we're willing to add stuff directly to {{blockInfoSet}}. Even in that case, I
would argue it might be easier to call {{addNewBlockInfo}} and then later roll
all the {{newBlockInfoSet}} items into {{blockInfoSet}}. The problem is that
having both functions creates confusion and increase the chance that someone
will add an incorrect call to the wrong one later on in another change.
{code}
private final SortedSet<BlockScanInfo> blockInfoSet
= new TreeSet<BlockScanInfo>(BlockScanInfo.LAST_SCAN_TIME_COMPARATOR);
private final Set<BlockScanInfo> newBlockInfoSet =
new HashSet<BlockScanInfo>();
{code}
It seems like a bad idea to use {{BlockScanInfo.LAST_SCAN_TIME_COMPARATOR}} for
blockInfoSet, but {{BlockScanInfo#hashCode}} (i.e. the HashSet strategy) for
{{newBlockInfoSet}}. Let's just use a {{SortedSet}} for both so we don't have
to ponder any possible discrepancies between the comparator and the hash
function. Another problem with {{HashSet}} (compared with {{TreeSet}}) is that
it never shrinks down after enlarging... a bad property for a temporary holding
area.
> Block Scan log rolling will never happen if blocks written continuously
> leading to huge size of dncp_block_verification.log.curr
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-6114
> URL: https://issues.apache.org/jira/browse/HDFS-6114
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.3.0, 2.4.0
> Reporter: Vinayakumar B
> Assignee: Vinayakumar B
> Priority: Critical
> Attachments: HDFS-6114.patch, HDFS-6114.patch
>
>
> 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are
> scanned.
> 2. If the blocks (with size in several MBs) to datanode are written
> continuously
> then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously
> scanning the blocks
> 3. These blocks will be deleted after some time (enough to get block scanned)
> 4. As Block Scanning is throttled, So verification of all blocks will take so
> much time.
> 5. Rolling will never happen, so even though the total number of blocks in
> datanode doesn't increases, entries ( which contains stale entries of deleted
> blocks) in *dncp_block_verification.log.curr* continuously increases leading
> to huge size.
> In one of our env, it grown more than 1TB where total number of blocks were
> only ~45k.
--
This message was sent by Atlassian JIRA
(v6.2#6252)