hanahmily commented on issue #13811:
URL: https://github.com/apache/skywalking/issues/13811#issuecomment-4243761325

   > t appears that the streaming TopN processor in the write phase does not 
perform aggregation pre-computation
   
   It's true
   
   > Assuming a data node has the following data: entity1=10 entity1=20 
entity2=15
   
   There are 3 different scenarios:
   
   1. Single time series (ts) bucket:
   In a single ts bucket, each entity retains only one value in a list. For 
example, the pre-calculated list might be: entity1=20, entity2=15. If entity=10 
appears, it is removed by entity1=20.
   
   2. Multiple ts buckets:
   Example:
   ts1: entity1=10, entity2=5
    ts2: entity1=20, entity2=15
   
   Using the Mean function with topn=2, the result from ts1 to ts2 is 
[entity1=15, entity2=10].
   
   3. Skewed data:
   ts1: entity1=10, entity2=5
   ts2: entity2=15, entity3=7, entity1=5
   
   In this case, entity1 is ignored in ts2. This behavior aligns with our 
design goal. The topN pre-calculation is an approximate Top-k process that 
sacrifices precision for performance. Typically, each list in a time bucket 
contains about 1,000 entries. Larger data volumes are used to improve accuracy. 
This algorithm aims to utilize system resources efficiently to handle top-k 
queries involving more than a million entities. That's why skywalking only 
applies the endpoint topN query to it. 
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to