[ 
https://issues.apache.org/jira/browse/HDFS-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212652#comment-16212652
 ] 

Weiwei Yang commented on HDFS-12691:
------------------------------------

Hi [~linyiqun]

Thanks for filing this. Well I agree this is definitely a place to improve, but 
the thing I am not sure what is a better value for the interval. I think we 
need some scale testing to understand where the bottleneck is. I generally 
agree that we can let {{KeyDeletingService}} and {{SCMBlockDeletingService}} to 
run on their own interval (run faster than 1minute), but if we get them 
configurable it will be too tricky for user to set a proper value. I tried to 
test this part before, but with single client, I could not find out the 
bottleneck. The test we need is lots of nodes, and lots of concurrent deletes 
on these nodes. Then we'll see how this performs. But at present, I don't have 
a solid idea how to improve this. Any good idea to share? Thanks

> Ozone: Decrease interval time of SCMBlockDeletingService for improving the 
> efficiency
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-12691
>                 URL: https://issues.apache.org/jira/browse/HDFS-12691
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone, performance
>    Affects Versions: HDFS-7240
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>
> After logging elapsed time of each block deletion task, found some places we 
> can make some improvements after testing. The logging during testing:
> {noformat}
> 2017-10-20 17:02:55,168 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:56,169 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:56,451 [SCMBlockDeletingService#0] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : SCMBlockDeletingService
> 2017-10-20 17:02:56,755 [KeyDeletingService#0] INFO  utils.BackgroundService 
> (BackgroundService.java:run(99)) - Running background service : 
> KeyDeletingService
> 2017-10-20 17:02:56,758 [KeyDeletingService#1] INFO  ksm.KeyDeletingService 
> (KeyDeletingService.java:call(99))      - Found 11 to-delete keys in KSM
> 2017-10-20 17:02:56,775 [IPC Server handler 19 on 52342] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:deleteKeyBlocks(870))      - SCM is informed by 
> KSM to delete 11 blocks
> 2017-10-20 17:02:57,182 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:57,640 [KeyDeletingService#1] INFO  ksm.KeyDeletingService 
> (KeyDeletingService.java:call(125))      - Number of key deleted from KSM DB: 
> 11, task elapsed time: 885ms
> 2017-10-20 17:02:58,168 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:03,178 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> ...
> 2017-10-20 17:03:04,167 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:05,173 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:06,095 [BlockDeletingService#0] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : BlockDeletingService
> 2017-10-20 17:03:06,095 [BlockDeletingService#0] INFO  
> background.BlockDeletingService (BlockDeletingService.java:getTasks(109))     
>  - Plan to choose 10 containers for block deletion, actually returns 0 valid 
> containers.
> 2017-10-20 17:03:06,171 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> ...
> 2017-10-20 17:03:54,279 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:55,267 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:56,282 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:56,461 [SCMBlockDeletingService#0] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : SCMBlockDeletingService
> 2017-10-20 17:03:56,467 [SCMBlockDeletingService#1] INFO  
> block.SCMBlockDeletingService (SCMBlockDeletingService.java:call(129))      - 
> Totally added 11 delete blocks command for 1 datanodes, task elapsed time: 6ms
> 2017-10-20 17:03:57,265 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:57,645 [KeyDeletingService#0] INFO  utils.BackgroundService 
> (BackgroundService.java:run(99)) - Running background service : 
> KeyDeletingService
> 2017-10-20 17:03:58,278 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:58,522 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:handle(87))      - Start to delete container 
> blocks, TXIDs=[1(0),2(0),3(0),4(0),5(0),6(0),7(0),8(0),9(0),10(0),11(0)], 
> numOfContainers=1, numOfBlocks=11
> 2017-10-20 17:03:58,851 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 29352ed9-cee2-4e05-beee-50284ea300c2 to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,879 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 88903455-756f-4640-8deb-069a5b2266fe to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,906 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block d2ec1e39-f768-4432-88a9-acc2c5d7a7f2 to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,934 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 2785467a-f05c-49a1-ac9a-fc2819f73eba to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,962 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block c4b39df9-d350-4a02-92eb-8ab3719e70c7 to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,990 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 0e85a4d3-2c6a-4822-9b9e-947451a8f645 to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,017 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block b022c3bb-059c-47ac-83ca-49ae281765fb to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,045 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 4b76df3d-8b23-46f4-bbc1-98847e7c82bd to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,073 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 6474e09e-4626-4352-89e5-2e11e6063203 to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,101 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 4ec42696-5388-4869-958a-714b81e8e5fb to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,235 [Command processor thread] INFO  
> commandhandler.DeleteBlocksCommandHandler 
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168))      - Transited 
> Block 2fdb437c-dc51-4d43-8dfb-9aff0c319712 to DELETING state in container 
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,236 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=1 from block deletion log
> 2017-10-20 17:03:59,236 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=2 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=3 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=4 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=5 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=6 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=7 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=8 from block deletion log
> 2017-10-20 17:03:59,238 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=9 from block deletion log
> 2017-10-20 17:03:59,238 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=10 from block deletion log
> 2017-10-20 17:03:59,238 [IPC Server handler 3 on 52328] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779))      - 
> Purging TXID=11 from block deletion log
> 2017-10-20 17:03:59,265 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:00,279 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:01,266 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:02,266 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:03,268 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:04,278 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:05,275 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:06,099 [BlockDeletingService#1] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : BlockDeletingService
> 2017-10-20 17:04:06,099 [BlockDeletingService#1] INFO  
> background.BlockDeletingService (BlockDeletingService.java:getTasks(109))     
>  - Plan to choose 10 containers for block deletion, actually returns 1 valid 
> containers.
> 2017-10-20 17:04:06,185 [BlockDeletingService#0] INFO  
> background.BlockDeletingService (BlockDeletingService.java:call(224))      - 
> Container: 1bfa7356-b07c-4d25-97b2-e66eaff19584, deleted blocks: 11, task 
> elapsed time: 84ms
> 2017-10-20 17:04:06,265 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:07,271 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:08,269 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> {noformat}
> Now three deletion tasks reuse the same interval time for executing 
> periodically, but actually this will lead some problem. From the elapsed time 
> showed in log, service {{KeyDeletingService}} will cost the most time, then 
> {{BlockDeletingService}} and {{SCMBlockDeletingService}} costs least time.
> We should make {{SCMBlockDeletingService}} executed frequently than other two 
> service, then the service {{BlockDeletingService}} can work more effectively 
> since it relies on {{SCMBlockDeletingService}}'s completion.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to