[influxdb] Performance considerations: where to aggregate data?

djflix Thu, 21 Jul 2016 03:22:17 -0700

I've had an InfluxDB deployment where I stored temperature and humidity data 
over a longer period. Recently we added other kinds of data (from different 
sources), but as in this construction every 'pair' of sensors requires it's own 
continuous query (as the ID's of the sensors don't match) to combine the data 
and make it queryable in it's own measurement. I'm currently building some 
real-life tests to verify the stuff below, but it would be great is someone has 
already done something similar and can tell me whether this is worth attempting 
at all :). I have a short question, but a slightly longer explanation:


Will switching to tagged measurements (as opposed to measurements containing 
multiple values per line) benefit continuous query performance/CPU load?

Consider this example:
Select temperature, humidity, movement from sensor1_combined where time > now() 
- 1h;

This is done by a continuous query pair in the following format:
CREATE CONTINUOUS QUERY s1_cli17_15min ON development RESAMPLE FOR 30m BEGIN 
Select mean(temperature) as temperature, mean(humidity) as humidity INTO 
development."15min".sensor1_combined from climate_17 GROUP BY time(30m) 
fill(none) END
CREATE CONTINUOUS QUERY s1_mov12_15min ON development RESAMPLE FOR 30m BEGIN 
Select sum(movement) as movement INTO development."15min".sensor1_combined from 
movement_12 GROUP BY time(30m) fill(none) END

Note that I have to resample, as often the data of the two sensors does not 
come in at the same time. This is why I resample with 2*15 minutes in the above 
example. For every sensor-pair the above queries also exist for 30m and 60m 
data. This allows me to get less data if I want to visualise longer timespans. 
But with a few hundreds of sensors (and 2 x 3 Continuous Queries per sensor) 
the 'idle' InfluxDB CPU usage is around 42%. 

I understand that this is due to constant querying and the inefficiency of 
running continuous queries on non-tagged series. However when designing a DB 
structure around tagged data, I cannot seem to get near the same simplicity for 
querying I get when just querying measurements. My current query looks like 
this:
select mean(value) as temperature from temperature where d = 'climate_17' and 
time > now() - 1h group by d, time(15m);
select mean(value) as humidity from humidity where d = 'climate_17' and time > 
now() - 1h group by d, time(15m);
select sum(value) as movement from movement where d = 'movement_12' and time > 
now() - 15m group by d, time(15m);

While I could create a Continuous Query to put this into a Measurement I'm not 
sure whether I'm actually benefitting from any performance improvement at all: 
while there might be improvement in combining climate and movement data, I 
still have to create a continuous query for every climate-movement-combination. 

Thanks!

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/5a4e8b3d-50bf-4ca9-b9be-d66faff0a132%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Performance considerations: where to aggregate data?

Reply via email to