Re: [influxdb] Re: Data model and Continuous Query: for 200K devices and 10 metrics each

carlo . activia Fri, 07 Oct 2016 09:20:42 -0700

Hi Sean,

Thanks for answering my questions.
BTW, I am using version 1.0


Regarding Model 2, I also added a tag called deviceGroup, so deviceId between 1 
and 999 will have deviceGroup='1000', deviceId between 1000 and 1999 will have 
deviceGroup='2000', ...

So I can run queries like this  
select * from "data" where deviceGroup = '???' and SYSTEM_MEMORY > 80 


Regarding the Continuous Query:

I did a quick test processing all devices 35 times (5 seconds between each run 
in my test). No continuous query created. I tested some queries that I might 
use without problems using deviceGroup tag.

After I executed the following query 
select MEAN(*) from monitorData group by deviceId, *

which is what the Continuous Query will use (I will group by time once it is in 
PROD or have more data in my test), but it never gave me a response back.

I started processing all devices again (10 more times) and ran the above query 
again, I got the message "Batch could not be sent. Data will be lost" in the 
application that feeds the data points and the query never returned results.

I am afraid that the Continuous query won't be able to handle the volume of 
data and make the whole system to slow down to a point that is not usable.

In PROD, we want to process all devices every 5 minutes, keep that data for a 
day or two, then have a weekly retention policy with data aggregated per hour 
using the Continuous Query mentioned above.

Thanks in advance for your time.

Best regards,

    Carlo





On Thursday, September 29, 2016 at 11:18:15 PM UTC-4, Sean Beckett wrote:
> I would recommend Model 2. Store each metric as a field on the same 
> measurement, with a Device ID tag.
> 
> 
> >                 Issues:
> >                 - Queries by value are slow (more than a minute). Example: 
> > select * from "data" where SYSTEM_MEMORY > 80 
> 
> 
> Queries that filter by field values are always slow. Field values are not 
> indexed. Running a query unbounded in time forces the system to scan every 
> single point to evaluate the condition.
> 
> 
> >                 - Continuous Query takes so much time: 
> >                   ... BEGIN SELECT mean(SYSTEM_MEMORY) as 
> > SYSTEM_MEMORY_mean INTO .... FROM  data GROUP BY time(5m), deviceId
> 
> 
> What does "so much time" mean? There doesn't appear to be anything 
> inefficient in the query. 
> 
> 
> You never talked about your data density. How many metrics are written every 
> five minutes?
> 
> 
> 
> 
> 
> 
> 
> On Tue, Sep 27, 2016 at 7:02 AM, Carlo Vargas <[email protected]> wrote:
> 
> BTW, for model 1 there will be 6 millions data points for database "devices" 
> (600K data points per database/measurement). 
> For model 3, there will be 600K data points per database (each database has 
> its own metric).
> 
> 
> 
> 
> 
> 
> 
> 
> On Monday, September 26, 2016 at 8:52:19 PM UTC-4, Carlo Vargas wrote:
> 
> 
> Currently I am evaluating different Time Series data bases and I do have some 
> questions regarding data modelling and query performance in InfluxDB.
> 
> Context: We have 200 000 devices and 10 metrics per device (for instance: 
> SYSTEM_MEMORY).
> 
> 
> Devices were processed 3 times, so we ended up with 600K data points.
> 
> Here are the 3 models that were used:
> 
> Model 1: One database named "devices", 10 measurements (one for each metric), 
> and the tag deviceId.
> 
> 
> 
> 
>                 Issues:
> 
>                 - Queries by value are not responding. Example of this query: 
> select * from SYSTEM_MEMORY where value > 80
>                 - It uses a lot of RAM, the server crashes when the above 
> query is executed or when the following Continuous Query is also executed:
>                   ... BEGIN SELECT mean(value) as mean_value INTO 
> devices."<current_policy>".:MEASUREMENT FROM devices."<new_policy>"./.*/ 
> GROUP BY time(5m), deviceId
> 
> Model 2: One database named "devices", one measurement named "data", deviceId 
> tag and each metric as a field.                
> 
> 
> 
>                 Issues:
>                 - Queries by value are slow (more than a minute). Example: 
> select * from "data" where SYSTEM_MEMORY > 80 
>                 - Continuous Query takes so much time: 
>                   ... BEGIN SELECT mean(SYSTEM_MEMORY) as SYSTEM_MEMORY_mean 
> INTO .... FROM  data GROUP BY time(5m), deviceId
> Model 3:  One database per metric, one measurement "data", and deviceId tag.
> 
> 
> 
>                 Issues:
>                 - Queries by value takes around 25 seconds. Example: select * 
> from "data" where value > 80 (this query is done in SYSTEM_MEMORY database)
>                 - Continuous Query needs to be created for each database and 
> they are slow.
>                 - Adding data points is slower than previous two models.
> 
> 
> Any advice/suggestion would be really appreciated.
> 
> Thanks in advance.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Remember to include the InfluxDB version number with all issue reports
> 
> --- 
> 
> You received this message because you are subscribed to the Google Groups 
> "InfluxDB" group.
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> 
> To post to this group, send email to [email protected].
> 
> Visit this group at https://groups.google.com/group/influxdb.
> 
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/4e0d07d0-fcda-4948-9bb0-ec8f93f703af%40googlegroups.com.
> 
> 
> 
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> 
> -- 
> 
> 
> Sean Beckett
> Director of Support and Professional Services
> InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/bd6f8e00-281d-4988-a7d9-9883ee1efe62%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Re: Data model and Continuous Query: for 200K devices and 10 metrics each

Reply via email to