tanisdlj opened a new issue #11840:
URL: https://github.com/apache/druid/issues/11840
### Affected Version
0.22.2
### Description
- Cluster size: 2 brokers, 2 routers, 2 coordinators, 37 historicals (15
hot, 21 cold, 1 frozen), 2 overlords, 43 middlemanagers
- Configurations in use: Any datasource storing data in the Frozen tier has
as retention rules:
```
[
{"type":"loadByPeriod","period":"P2M","tieredReplicants":{"stde2-hot":2}},
{"type":"loadByPeriod","period":"P14M","tieredReplicants":{"stde2-cold":1}},
{"type":"loadForever","tieredReplicants":{"stde2-frozen":1}}
]
```
Historical (Frozen) config:
```
druid.service=druid/historical
druid.plaintextPort=8083
druid.server.tier=stde2-frozen
druid.server.http.numThreads=90
druid.processing.buffer.sizeBytes=1GiB
druid.processing.numThreads=54
druid.processing.numMergeBuffers=3
druid.segmentCache.locations=[{"path":"/druid/segment-cache","maxSize":"16T","freeSpacePercent":
1.0}]
druid.segmentCache.lazyLoadOnStart=false
druid.segmentCache.numLoadingThreads=128
druid.segmentCache.numBootstrapThreads=128
druid.historical.cache.useCache=true
druid.historical.cache.populateCache=true
druid.cache.type=caffeine
druid.cache.sizeInBytes=1GiB
druid.query.vectorize=true
druid.monitoring.monitors=["org.apache.druid.client.cache.CacheMonitor","org.apache.druid.server.metrics.HistoricalMetricsMonitor"]
```
Coordinator config:
```
druid.service=druid/coordinator
druid.plaintextPort=8081
druid.coordinator.startDelay=PT300S
druid.coordinator.period=PT60S
druid.coordinator.kill.on=true
druid.coordinator.kill.maxSegments=100
druid.coordinator.kill.durationToRetain=P7D
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http
druid.coordinator.loadqueuepeon.http.batchSize=56
druid.coordinator.loadqueuepeon.curator.numCallbackThreads=200
druid.coordinator.balancer.strategy=cachingCost
druid.coordinator.balancer.cachingCost.awaitInitialization=true
maxSegmentsInNodeLoadingQueue=1000
druid.announcer.type=http
```
- Steps to reproduce the problem: Set a single server with a new tier with
old data, expect to load all the segments assigned to it: it won't happen.
- The error message or stack traces encountered. Providing more context,
such as nearby log messages or even entire logs, can be helpful:
Coordinator log (the same two messages goes on and on forever):
```
Oct 25 08:51:44 druid-master-1 java[19246]: 2021-10-25T08:51:44,668 INFO
[Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule -
Loading in progress, skipping drop until loading is complete
Oct 25 08:51:44 druid-master-1 java[19246]: 2021-10-25T08:51:44,668 WARN
[Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - No
available [stde2-frozen] servers or node capacity to assign primary
segment[datasource_2020-05-07T17:00:00.000Z_2020-05-07T18:00:00.000Z_2020-05-07T17:00:00.025Z_104]!
Expected Replicants[1]
Oct 25 08:51:44 druid-master-1 java[19246]: 2021-10-25T08:51:44,668 INFO
[Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule -
Loading in progress, skipping drop until loading is complete
Oct 25 08:51:44 druid-master-1 java[19246]: 2021-10-25T08:51:44,668 WARN
[Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - No
available [stde2-frozen] servers or node capacity to assign primary
segment[datasource_2020-05-07T17:00:00.000Z_2020-05-07T18:00:00.000Z_2020-05-07T17:00:00.025Z_103]!
Expected Replicants[1]
Oct 25 08:51:44 druid-master-1 java[19246]: 2021-10-25T08:51:44,668 INFO
[Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule -
Loading in progress, skipping drop until loading is complete
Oct 25 08:51:44 druid-master-1 java[19246]: 2021-10-25T08:51:44,668 WARN
[Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - No
available [stde2-frozen] servers or node capacity to assign primary
segment[datasource_2020-05-07T17:00:00.000Z_2020-05-07T18:00:00.000Z_2020-05-07T17:00:00.025Z_102]!
Expected Replicants[1]```
Log of the "Frozen" historical (always the same too)
```
Oct 25 09:19:57 stde2-hfrozen-01.stde2java[50891]: 2021-10-25T09:19:57,860
INFO [NamespaceExtractionCacheManager-0]
org.apache.druid.server.lookup.namespace.UriCacheGenerator - Finished loading
9,041 values from 9,041 lines for [namespace
[UriExtractionNamespace{uri=file:///usr/share/druid/lookups/xyz.json,
uriPrefix=null, namespaceParseSpec=JSONFlatDataParser{keyFieldName='token',
valueFieldName='categoryName'}, fileRegex='null', pollPeriod=PT30M}] :
org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@2b10ee1d]
in 35,353,269 ns
Oct 25 09:19:57 stde2-hfrozen-01.stde2 java[50891]: 2021-10-25T09:19:57,868
INFO [NamespaceExtractionCacheManager-1]
org.apache.druid.server.lookup.namespace.UriCacheGenerator - Finished loading
10,000 values from 10,000 lines for [namespace
[UriExtractionNamespace{uri=file:///usr/share/druid/lookups/abc.json,
uriPrefix=null, namespaceParseSpec=JSONFlatDataParser{keyFieldName='id',
valueFieldName='size'}, fileRegex='null', pollPeriod=PT30M}] :
org.apache.druid.server.lookup.namespace.cache.CacheScheduler$EntryImpl@46ebec02]
in 43,116,107 ns
```
Frozen disk free:
```
~$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb2 439G 5.8G 411G 2% /
/dev/sdb1 488M 95M 368M 21% /boot
/dev/md0 15T 5.8T 8.9T 40% /druid/segment-cache
```
Frozen disk free reported by druid console:

Historical Frozen hardware is:
- AMD EPYC 7502P 32 Cores "Rome"
- 256GB RAM DDR4 ECC
- 2 x HDD (RAID 1) 16Tb + 1 SSD root disk (480Gb)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]