[
https://issues.apache.org/jira/browse/HDFS-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215076#comment-16215076
]
Weiwei Yang commented on HDFS-12691:
------------------------------------
Hi [~linyiqun]
bq. Because one tx is corresponding to one block
A TX contains multiple to-be-deleted blocks located in a same container of a
deleted key, it's like a map
{noformat}
{container: [B1, B2, B3 ...]}
{noformat}
So number of TXs doesn't equal to number of blocks to delete. You can refer the
design doc in HDFS-11922, see *Designs and Workflows -> 2*. If you delete a big
object, the block list might be pretty long.
As in this context, since KSM/SCM/DN protocol are all async, it makes it is
hard to determine how to do proper throttling on each layer and at the same
time to make it efficient. That was why I suggested to revisit this part later
when we have scale testing.
But one thing clear to me is HDFS-12443, which I believe could improve the
performance a lot. I am not working on it recently as I am totally occupied by
some other stuff, if you want to work on it, feel free to take over. Talking
back to this one, as it is hard to eval the improvement this can make, I
suggest to close this one as not now and revisit when we have a clear thought
how to improve. What do you think?
Thanks
> Ozone: Decrease interval time of SCMBlockDeletingService for improving the
> efficiency
> -------------------------------------------------------------------------------------
>
> Key: HDFS-12691
> URL: https://issues.apache.org/jira/browse/HDFS-12691
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ozone, performance
> Affects Versions: HDFS-7240
> Reporter: Yiqun Lin
> Assignee: Yiqun Lin
>
> After logging elapsed time of each block deletion task, found some places we
> can make some improvements after testing. The logging during testing:
> {noformat}
> 2017-10-20 17:02:55,168 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:56,169 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:56,451 [SCMBlockDeletingService#0] INFO
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background
> service : SCMBlockDeletingService
> 2017-10-20 17:02:56,755 [KeyDeletingService#0] INFO utils.BackgroundService
> (BackgroundService.java:run(99)) - Running background service :
> KeyDeletingService
> 2017-10-20 17:02:56,758 [KeyDeletingService#1] INFO ksm.KeyDeletingService
> (KeyDeletingService.java:call(99)) - Found 11 to-delete keys in KSM
> 2017-10-20 17:02:56,775 [IPC Server handler 19 on 52342] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:deleteKeyBlocks(870)) - SCM is informed by
> KSM to delete 11 blocks
> 2017-10-20 17:02:57,182 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:57,640 [KeyDeletingService#1] INFO ksm.KeyDeletingService
> (KeyDeletingService.java:call(125)) - Number of key deleted from KSM DB:
> 11, task elapsed time: 885ms
> 2017-10-20 17:02:58,168 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:03,178 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> ...
> 2017-10-20 17:03:04,167 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:05,173 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:06,095 [BlockDeletingService#0] INFO
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background
> service : BlockDeletingService
> 2017-10-20 17:03:06,095 [BlockDeletingService#0] INFO
> background.BlockDeletingService (BlockDeletingService.java:getTasks(109))
> - Plan to choose 10 containers for block deletion, actually returns 0 valid
> containers.
> 2017-10-20 17:03:06,171 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> ...
> 2017-10-20 17:03:54,279 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:55,267 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:56,282 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:56,461 [SCMBlockDeletingService#0] INFO
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background
> service : SCMBlockDeletingService
> 2017-10-20 17:03:56,467 [SCMBlockDeletingService#1] INFO
> block.SCMBlockDeletingService (SCMBlockDeletingService.java:call(129)) -
> Totally added 11 delete blocks command for 1 datanodes, task elapsed time: 6ms
> 2017-10-20 17:03:57,265 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:57,645 [KeyDeletingService#0] INFO utils.BackgroundService
> (BackgroundService.java:run(99)) - Running background service :
> KeyDeletingService
> 2017-10-20 17:03:58,278 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:58,522 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:handle(87)) - Start to delete container
> blocks, TXIDs=[1(0),2(0),3(0),4(0),5(0),6(0),7(0),8(0),9(0),10(0),11(0)],
> numOfContainers=1, numOfBlocks=11
> 2017-10-20 17:03:58,851 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 29352ed9-cee2-4e05-beee-50284ea300c2 to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,879 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 88903455-756f-4640-8deb-069a5b2266fe to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,906 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block d2ec1e39-f768-4432-88a9-acc2c5d7a7f2 to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,934 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 2785467a-f05c-49a1-ac9a-fc2819f73eba to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,962 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block c4b39df9-d350-4a02-92eb-8ab3719e70c7 to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:58,990 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 0e85a4d3-2c6a-4822-9b9e-947451a8f645 to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,017 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block b022c3bb-059c-47ac-83ca-49ae281765fb to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,045 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 4b76df3d-8b23-46f4-bbc1-98847e7c82bd to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,073 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 6474e09e-4626-4352-89e5-2e11e6063203 to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,101 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 4ec42696-5388-4869-958a-714b81e8e5fb to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,235 [Command processor thread] INFO
> commandhandler.DeleteBlocksCommandHandler
> (DeleteBlocksCommandHandler.java:deleteContainerBlocks(168)) - Transited
> Block 2fdb437c-dc51-4d43-8dfb-9aff0c319712 to DELETING state in container
> 1bfa7356-b07c-4d25-97b2-e66eaff19584
> 2017-10-20 17:03:59,236 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=1 from block deletion log
> 2017-10-20 17:03:59,236 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=2 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=3 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=4 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=5 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=6 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=7 from block deletion log
> 2017-10-20 17:03:59,237 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=8 from block deletion log
> 2017-10-20 17:03:59,238 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=9 from block deletion log
> 2017-10-20 17:03:59,238 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=10 from block deletion log
> 2017-10-20 17:03:59,238 [IPC Server handler 3 on 52328] INFO
> scm.StorageContainerManager
> (StorageContainerManager.java:sendContainerBlocksDeletionACK(779)) -
> Purging TXID=11 from block deletion log
> 2017-10-20 17:03:59,265 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:00,279 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:01,266 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:02,266 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:03,268 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:04,278 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:05,275 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:06,099 [BlockDeletingService#1] INFO
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background
> service : BlockDeletingService
> 2017-10-20 17:04:06,099 [BlockDeletingService#1] INFO
> background.BlockDeletingService (BlockDeletingService.java:getTasks(109))
> - Plan to choose 10 containers for block deletion, actually returns 1 valid
> containers.
> 2017-10-20 17:04:06,185 [BlockDeletingService#0] INFO
> background.BlockDeletingService (BlockDeletingService.java:call(224)) -
> Container: 1bfa7356-b07c-4d25-97b2-e66eaff19584, deleted blocks: 11, task
> elapsed time: 84ms
> 2017-10-20 17:04:06,265 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:07,271 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:04:08,269 [Datanode State Machine Thread - 0] INFO
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> {noformat}
> Now three deletion tasks reuse the same interval time for executing
> periodically, but actually this will lead some problem. From the elapsed time
> showed in log, service {{KeyDeletingService}} will cost the most time, then
> {{BlockDeletingService}} and {{SCMBlockDeletingService}} costs least time.
> We should make {{SCMBlockDeletingService}} executed frequently than other two
> service, then the service {{BlockDeletingService}} can work more effectively
> since it relies on {{SCMBlockDeletingService}}'s completion.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]