wu-sheng opened a new issue, #10051:
URL: https://github.com/apache/skywalking/issues/10051

   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/skywalking/issues?q=is%3Aissue) and found no 
similar feature requirement.
   
   
   ### Description
   
   In 9.3.0, we enhanced the cache mechanism in #10021 and verified through 
#10025. The `ID` read by the persistent worker at the minute dimensionality now 
has 60% cached. Only the first bulk of every minute would require `ID` read.
   
   When we look deeper into the logic, we would find out actually that, **from 
the minute dimensionality, this `ID` read is not required**.
   We are concerned that no metric in the cache doesn't mean there is no metric 
in the database. Especially when 
   1. The timestamp is not synced in the cluster, so, timestamps of telemetry 
are not ordered by the time series.
   2. OAP is booting/rebooting, and the cache is cold.
   
   About <1>, we don't expect this anymore. Our TTL and metric/topo analysis 
are all relying on timestamps generally synced. It doesn't have to be synced in 
the `ms` level, but at least with only a 3-5s gap.
   
   About <2>, we only should try to load metrics from DB in the 1 minute after 
rebooting, considering the assumption about time synced in the <1>. So, the 
metrics would overlap existing metrics generated in one booting period. 
   There is little chance we faced data conflicts, even if we faced them, we 
just generate metrics at the booting minute inaccurate. In the best practice, 
we could keep loading metrics from the database when metrics timestamps are 
before the **OAP started timestamp** as a fail-safe.
   
   Regarding hour and day dimensionalities, there is nothing different. We just 
should keep `loading metrics from database` when the hour/day time bucket 
before **OAP started hour / day**.
   
   @hanahmily @kezhenxu94 @wankai123 PTAL. Considering 9.3.0 releasing soon, I 
don't want to take a risk to change for now.
   But the theory should be correct, please help on rechecking.
   
   ### Use case
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: 
[email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to