## Phenomena 1. The expired indices had never been released before rebooting the OAP cluster or scaling the cluster size to 1. Found the following logs in all OAP pod logs, `The selected first getAddress is xxx.xxx.xx.xx:port. The remove stage is skipped.` 2. Can't reboot the OAP pod and see the endless rolling logs `table: xxx does not exist. OAP is running in 'no-init' mode, waiting... retry 3s later.`
 ## Root cause This bug exists for many years, back to Oct. 2018. TTL timer expected `queryRemoteNodes` always returns a certain ordered OAP instance list, which makes one OAP node would be selected to take the responsibility of removing expired indices and create the latest(today's) indices when rolling. But, typically and proved, when using k8s coordinator, the k8s coordinator would not return an order instance list, which could have no OAP nodes selected, and the TTL timer would not really work in any case. In this case, most indices could be created normally as new telemetry data would trigger index creation automatically. But 1. Expired indices would not be removed 2. Some features(as not be used) may not have new telemetry data, so, no latest date index was created. But the rebooting would verify and expect the latest data index, which could lead to Phenomena <2>. ## Fix The pull request to fix this is https://github.com/apache/skywalking/pull/9632. Sheng Wu 吴晟 Twitter, wusheng1108