[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr

Colin Patrick McCabe (JIRA) Tue, 15 Jul 2014 10:03:21 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062322#comment-14062322
 ]


Colin Patrick McCabe commented on HDFS-6114:
--------------------------------------------

bq. blockInfoSet is required to be sorted based on the lastScanTime, as oldest 
scanned block will be picked for scanning, which will be the first element in 
this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because 
BlockScanInfo#hashCode() is default which will sort based on the blockId rather 
than scan time.  Do you suggest me to update this hashCode() itself?

I was suggesting that you use a {{TreeSet}} or {{TreeMap}} with the same 
comparator as {{blockInfoSet}}.  All the hash sets that I'm aware of do not 
shrink down after enlarging.

bq. So delBlockInfo and delNewBlockInfo serves separate purposes and both are 
required.

I can write a version of the patch that only has one del function and only one 
add function.  I am really reluctant to put in another set of add/del functions 
on top of what's already there, since I think it will make things hard to 
understand for people trying to modify this code later or backport this patch 
to other branches.

> Block Scan log rolling will never happen if blocks written continuously 
> leading to huge size of dncp_block_verification.log.curr
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6114
>                 URL: https://issues.apache.org/jira/browse/HDFS-6114
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.3.0, 2.4.0
>            Reporter: Vinayakumar B
>            Assignee: Vinayakumar B
>            Priority: Critical
>         Attachments: HDFS-6114.patch, HDFS-6114.patch
>
>
> 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are 
> scanned. 
> 2. If the blocks (with size in several MBs) to datanode are written 
> continuously 
> then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously 
> scanning the blocks
> 3. These blocks will be deleted after some time (enough to get block scanned)
> 4. As Block Scanning is throttled, So verification of all blocks will take so 
> much time.
> 5. Rolling will never happen, so even though the total number of blocks in 
> datanode doesn't increases, entries ( which contains stale entries of deleted 
> blocks) in *dncp_block_verification.log.curr* continuously increases leading 
> to huge size.
> In one of our env, it grown more than 1TB where total number of blocks were 
> only ~45k.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr

Reply via email to