hanahmily commented on issue #13811:
URL: https://github.com/apache/skywalking/issues/13811#issuecomment-4243761325
> t appears that the streaming TopN processor in the write phase does not
perform aggregation pre-computation
It's true
> Assuming a data node has the following data: entity1=10 entity1=20
entity2=15
There are 3 different scenarios:
1. Single time series (ts) bucket:
In a single ts bucket, each entity retains only one value in a list. For
example, the pre-calculated list might be: entity1=20, entity2=15. If entity=10
appears, it is removed by entity1=20.
2. Multiple ts buckets:
Example:
ts1: entity1=10, entity2=5
ts2: entity1=20, entity2=15
Using the Mean function with topn=2, the result from ts1 to ts2 is
[entity1=15, entity2=10].
3. Skewed data:
ts1: entity1=10, entity2=5
ts2: entity2=15, entity3=7, entity1=5
In this case, entity1 is ignored in ts2. This behavior aligns with our
design goal. The topN pre-calculation is an approximate Top-k process that
sacrifices precision for performance. Typically, each list in a time bucket
contains about 1,000 entries. Larger data volumes are used to improve accuracy.
This algorithm aims to utilize system resources efficiently to handle top-k
queries involving more than a million entities. That's why skywalking only
applies the endpoint topN query to it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]