Maksim Myskov created HDDS-6225:
-----------------------------------
Summary: Pipelines distribution across datanodes is uneven
Key: HDDS-6225
URL: https://issues.apache.org/jira/browse/HDDS-6225
Project: Apache Ozone
Issue Type: Bug
Reporter: Maksim Myskov
Despite HDDS-5441 has been done to improve pipeline distribution, there is
still an issue.
I made a couple of tests:
Configuration1. 6 datanodes, ozone.scm.datanode.disallow.same.peers = true, 3
pipelines
|| Datanode||Pipelines||
|8cc97cc8-31dd-463d-8a89-7c76d95d709f|2|
|ab3dc657-736e-49a0-a5c9-e0e4962773d9|2|
|c3f91923-305f-48b1-b38b-2db1e4b92162|2|
|cd16db96-92cb-4056-86f8-563e1ac1bfce|2|
|292584d1-6de5-4249-a601-25a813df997e|2|
|4bf6cae2-1757-4b67-aba4-272d3148c838|2|
Configuration2. 6 datanodes, ozone.scm.datanode.disallow.same.peers = false, 8
pipelines
||Datanode||Pipelines||
|cc131824-ccbe-4e8a-9ec4-e7889275e90b|5|
|164b4515-7735-4fb5-9836-d995c9fafb88|5|
|1a217fc9-9b45-4161-b1e8-fda71a3c38c6|5|
|2f0bcc4e-407a-4e88-8465-d9e5e6ba8bdc|3|
|5df86a41-3c17-4dc1-9e57-410fe5ad7842|1|
|71db1b20-5d89-4919-8566-683c3f24ad5e|5|
Configuration3. 7 datanodes, ozone.scm.datanode.disallow.same.peers = true, 7
pipelines
||Datanode||Pipelines||
|a630e56a-3e1c-47c3-8d49-fb6636a81f2f|3|
|b99a61c9-d907-4536-ac65-e1df254b6190|3|
|005f3c5a-5001-46ef-9016-44114e7b4e32|3|
|213d0cfd-afbb-462e-9f70-2b959a2dbe91|3|
|6531b78c-2ba0-44c2-a7f7-7271f47bd381|3|
|706625cb-15ab-4f5e-a23a-f2de72946490|3|
|77ed9147-6a86-43e1-81fe-ed2378f8f518|3|
Configuration4. 7 datanodes, ozone.scm.datanode.disallow.same.peers = false, 10
pipelines
||Datanode||Pipelines||
|9a019bd1-fa44-4e67-950f-d2e15338b0cd|5|
|c948417a-784e-4dd2-8ca3-715d67ac9891|5|
|d49c9c19-1a60-407e-975b-9fe00a8ac44c|5|
|e2792111-0506-49ed-8096-6bed67675fc4|5|
|0fb9ddcd-5bda-4a8b-bada-39f6f2ec7ec2|5|
|2011f197-6248-4a6f-9c7b-e3611e67d3d0|5|
|782d9313-345c-4297-ab6b-d57b7eb4910f|0|
Configuration5. 8 datanodes, ozone.scm.datanode.disallow.same.peers = true, 7
pipelines
||Datanode||Pipelines||
|9b1146e2-7ff0-4e25-b42d-7e7217bd01f3|3|
|a593ecab-5237-4c0e-bcd0-411839bcd1ef|3|
|02f2c58a-c172-4e10-84e9-5cae850327f9|3|
|0f18f35a-56a2-42ea-8f0e-7d2a84b16c6c|3|
|3a11728e-637f-4337-a3cf-7b9a274e3614|3|
|446a32f2-fdb3-4386-813c-0e5f1c9f8c31|0|
|56522197-646a-4964-8e94-d9c8f7e8687b|3|
|7f21c5a2-fcd0-4b40-8b0c-cc647c50e6b1|3|
Configuration6. 8 datanodes, ozone.scm.datanode.disallow.same.peers = false 12
pipelines
||Datanode||Pipelines||
|9eba6024-9c1e-4a63-b0ea-c71593d0efeb|5|
|ad6ff8f0-336f-4ba5-b203-787870d1b7ca|5|
|030f3e80-6ac1-4c87-ab25-6538b50602cd|5|
|09e83e1d-8e9b-429a-8f89-81cba1523487|5|
|226b0722-42f1-47e4-8b90-083e7174a920|5|
|459a8649-4238-407d-9d3e-4d0577acf496|5|
|4680966d-fd24-4f24-831d-084382c93dfe|5|
|4aacc584-138c-4ce3-bb61-64bca8384571|1|
Configuration7. 9 datanodes, ozone.scm.datanode.disallow.same.peers = true, 8
pipelines
||Datanode||Pipelines||
|a22a1248-0627-4616-a896-3d434687b4c9|1|
|b9cc0695-4817-4f7d-ab70-b2257bfb3fc4|3|
|bfd99d04-897b-4695-a328-31195bed7a46|4|
|fefa8454-a79e-4137-95fa-90f1942b5b1c|1|
|0c4a3312-7313-4b23-a5fa-f6bc2f7f1593|3|
|0e8f830e-40b0-46fb-b0f8-8bb07447b816|3|
|490cd956-16d6-401b-af54-9f2570dc044c|3|
|517346bc-4f3a-4c65-9141-0fb9b580bf74|3|
|5ec043f7-b69f-4780-8383-e085569ddc96|3|
Configuratio8. 9 datanodes, ozone.scm.datanode.disallow.same.peers = false, 15
pipelines
||Datanode||Pipelines||
|a31576cf-1ff7-4931-b4f1-561e26deefff|5|
|c8ecddeb-1c6e-4f35-b4c7-a535ed95c53f|5|
|fae64881-b815-4fb1-ac8a-6932f7792d69|5|
|13fdcaa1-6032-4fc9-946b-7e92577cbd32|5|
|2024dce3-2de3-416b-bf62-2b4dff54cd9f|5|
|38e6915a-90e7-444f-a6f3-72b0b03de29d|5|
|60206eff-6630-4640-b68b-f3986bf3d299|5|
|68040769-eab6-49bc-a5b4-ebe179d0df09|5|
|7f755ac1-2fcd-48f6-8c18-b82914955f58|5|
{color:#000000}Configuration9. 10 {color} datanodes,
ozone.scm.datanode.disallow.same.peers = true, 11 pipelines
||Datanode||Pipelines||
|8edb6712-e5c2-4f19-a9d2-be80cb444ddd|4|
|9676338d-125a-4798-98ad-1dd25fbbf150|2|
|d4607619-925a-4d26-85b1-7fb29fdd276a|4|
|d72313a7-4d95-44f8-b020-716ea44d7a24|3|
|daa01019-1595-4951-a404-c7ef88377280|4|
|f26d824a-40e1-4fef-be74-13021ec3383c|4|
|122d4373-16df-406e-b937-07a136031c98|3|
|26b84595-9bb3-40b5-8b19-f678de2f7ed3|4|
|28f8a99d-52e5-429e-b5a1-6cb3f180b2fa|2|
|44d41c19-8f30-48b4-a466-8d8413f21705|3|
{color:#000000}Configuration10. 10 {color} datanodes,
ozone.scm.datanode.disallow.same.peers = false, 16 pipelines
||Datanode||Pipelines||
|a5b0d0a8-41f0-4110-aef1-73f7a63defa2|5|
|a9600399-19c6-4a4d-bd98-73853807266|5|
|fcefa984-e74f-4370-8bb7-e13fddd549fc|5|
|14a2cd7e-1f10-45f5-8ba0-d6515f6031b|5|
|1fc1c312-c9b7-403f-aa42-db44d07d4812|5|
|27d5af80-0bfe-423e-9b9d-882c380c5b28|3|
|423c5624-683b-4269-9444-b01bc69093a|5|
|5de9099e-0d3d-47a3-a9d2-344f99efaf7c|5|
|5edd32e1-ea56-4485-94cd-42ada48211da|5|
|64bc438e-a00d-4b62-9a62-2e5496058468|5|
Take a look at configuration5 and configuration6. In this case property
introduced in HDDS-5441 makes worse.
Sometimes it helps(configuration4), sometimes not.
The root cause of the issue seems to be lack of pipeline rebalancing mechanism.
If pipeline distribution of a cluster is uneven, I can manually rebalance
pipelines by closing existing pipelines (via ozone admin tool), in this case
SCM will create new pipelines instead of closed ones. It would be great for SCM
to rebalance pipelines automatically (including cases when a new node is added
to a cluster). Does such rebalancing mechanism fit Ozone architecture? Or is
there another way to avoid uneven pipelines distribution?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]