tanisdlj opened a new issue #11841: URL: https://github.com/apache/druid/issues/11841
### Affected Version 0.22.2 ### Description - Cluster size: 2 brokers, 2 routers, 2 coordinators, 37 historicals (15 hot, 21 cold, 1 frozen), 2 overlords, 43 middlemanagers - Steps to reproduce the problem: One morning we found that during the night, a massive rebalanced happened leaving many servers at 100% disk usage while others in the same tier were left empty. After restarting the coordinator segments were better balanced but we started noticing this issue. Many servers in two different tiers had their disks full while reporting not being full. Coordinator log: ``` Oct 25 08:59:53 druid-master-1 java[19246]: 2021-10-25T08:59:53,185 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://stde2-hhot-10.stde2] Failed segment[datasource_2021-09-04T11:00:00.000Z_2021-09-04T12:00:00.000Z_2021-09-04T11:00:00.016Z_226] request[SegmentChangeRequestLoad] with cause [Exception loading segment[datasource_2021-09-04T11:00:00.000Z_2021-09-04T12:00:00.000Z_2021-09-04T11:00:00.016Z_226]]. Oct 25 08:59:53 druid-master-1 java[19246]: 2021-10-25T08:59:53,475 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://stde2-hhot-01.stde2] Failed segment[datasource_2021-08-31T19:00:00.000Z_2021-08-31T20:00:00.000Z_2021-08-31T19:00:00.019Z_449] request[SegmentChangeRequestLoad] with cause [Exception loading segment[datasource_2021-08-31T19:00:00.000Z_2021-08-31T20:00:00.000Z_2021-08-31T19:00:00.019Z_449]]. Oct 25 08:59:55 druid-master-1 java[19246]: 2021-10-25T08:59:55,918 ERROR [Master-PeonExec--0] org.apache.druid.server.coordinator.HttpLoadQueuePeon - Server[http://stde2-hhot-01.stde2] Failed segment[datasource_2021-10-24T17:00:00.000Z_2021-10-24T18:00:00.000Z_2021-10-24T17:00:00.012Z_339] request[SegmentChangeRequestLoad] with cause [Exception loading segment[datasource_2021-10-24T17:00:00.000Z_2021-10-24T18:00:00.000Z_2021-10-24T17:00:00.012Z_339]]. ```   -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
