Dan Kirkwood created TC-303:
-------------------------------

             Summary: Influx summary query returns results not found in the 
corresponding series query (i.e. max and min) 
                 Key: TC-303
                 URL: https://issues.apache.org/jira/browse/TC-303
             Project: Traffic Control
          Issue Type: Bug
          Components: Traffic Ops
    Affects Versions: 2.1.0
            Reporter: Dan Kirkwood
            Priority: Minor


>From https://github.com/Comcast/traffic_control/issues/539

For example, the influx series query results may look like this:

[
[time, value],
[time, 10],
[time, 20],
[time, 34],
[time, 26]
]

and the influx summary query results for the same timeframe may look like:

{
time: x,
mean: y,
min: 8,
max: 75
}

notice how the min=8 and max=75 is strange because 8 and 75 are not found in 
the series query. this is because the series query is being grouped into 60s 
intervals (which is an average of 6 10s intervals) but the summary query looks 
at every value recorded in influx (on the 10s interval) between the timeframe.

so in this example, i would expect min=10 and max=34

maybe it's possible to run the summary query against the series query results?? 
like a subselect query like this:

SELECT mean(value), percentile(value, 5), percentile(value, 95), 
percentile(value, 98), min(value), max(value), count(value) FROM (SELECT 
sum(value)/count(value) FROM tps_total WHERE cachegroup = 'total' AND 
deliveryservice = 'ds-name' AND time >='2015-09-17T03:38:00-06:00' AND time <= 
'2015-09-17T15:38:00-06:00' GROUP BY time(60s), cachegroup)

this bug pertains to the following api endpoint 
/api/version/deliveryservice_stats.json when data source is influx.

here are a couple sample influx queries:

summary_query #-> $VAR1 = 'SELECT mean(value), percentile(value, 5), 
percentile(value, 95), percentile(value, 98), min(value), max(value), 
count(value) FROM tps_total WHERE time >= '2015-09-17T03:38:00-06:00' AND time 
<= '2015-09-17T15:38:00-06:00' AND cachegroup = 'total' AND deliveryservice = 
'ds-name'';

series_query #-> $VAR1 = 'SELECT sum(value)/count(value) FROM tps_total WHERE 
cachegroup = 'total' AND deliveryservice = 'ds-name' AND time 
>='2015-09-17T03:38:00-06:00' AND time <= '2015-09-17T15:38:00-06:00' GROUP BY 
time(60s), cachegroup';


from [[email protected]]:
this is not fixed. :( if you pass in another interval like 1h, this issue again 
occurs.

this will require a summary query based on a series query or basically a nested 
query as the issue suggested and nested queries are not yet supported in 
influxdb - influxdb/influxdb#52




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to