Re: [influxdb] Re: Data model and Continuous Query: for 200K devices and 10 metrics each

Sean Beckett Fri, 07 Oct 2016 10:18:37 -0700

On Fri, Oct 7, 2016 at 10:20 AM, <[email protected]> wrote:

> Hi Sean,
>
> Thanks for answering my questions.
> BTW, I am using version 1.0
>
> Regarding Model 2, I also added a tag called deviceGroup, so deviceId
> between 1 and 999 will have deviceGroup='1000', deviceId between 1000 and
> 1999 will have deviceGroup='2000', ...
>
> So I can run queries like this
> select * from "data" where deviceGroup = '???' and SYSTEM_MEMORY > 80
>
>
> Regarding the Continuous Query:
>
> I did a quick test processing all devices 35 times (5 seconds between each
> run in my test). No continuous query created. I tested some queries that I
> might use without problems using deviceGroup tag.
>
> After I executed the following query
> select MEAN(*) from monitorData group by deviceId, *
>
> which is what the Continuous Query will use (I will group by time once it
> is in PROD or have more data in my test), but it never gave me a response
> back.
>


Meaning the query timed out, the process OOM'd, or the query returned
immediately with no results?

With no time restriction on that query, that might be a lot of points.
That's taking the mean of every field from all time, grouped by every tag.
If there are more than a few hundred thousand points that could be quite
expensive.


> I started processing all devices again (10 more times) and ran the above
> query again, I got the message "Batch could not be sent. Data will be lost"
> in the application that feeds the data points and the query never returned
> results.
>

I don't understand. Did that error come from InfluxDB? Can you share actual
error output rather than descriptions of the output?


>
> I am afraid that the Continuous query won't be able to handle the volume
> of data and make the whole system to slow down to a point that is not
> usable.
>

How many values per second are you writing to the system?
How many series?
What are the machine specs? RAM, CPU, IOPS in particular


>
> In PROD, we want to process all devices every 5 minutes, keep that data
> for a day or two, then have a weekly retention policy with data aggregated
> per hour using the Continuous Query mentioned above.
>

That's the canonical use case for CQs and RPs.


>
> Thanks in advance for your time.
>
> Best regards,
>
>     Carlo
>
>
>
>
>
> On Thursday, September 29, 2016 at 11:18:15 PM UTC-4, Sean Beckett wrote:
> > I would recommend Model 2. Store each metric as a field on the same
> measurement, with a Device ID tag.
> >
> >
> > >                 Issues:
> > >                 - Queries by value are slow (more than a minute).
> Example: select * from "data" where SYSTEM_MEMORY > 80
> >
> >
> > Queries that filter by field values are always slow. Field values are
> not indexed. Running a query unbounded in time forces the system to scan
> every single point to evaluate the condition.
> >
> >
> > >                 - Continuous Query takes so much time:
> > >                   ... BEGIN SELECT mean(SYSTEM_MEMORY)
> as SYSTEM_MEMORY_mean INTO .... FROM  data GROUP BY time(5m), deviceId
> >
> >
> > What does "so much time" mean? There doesn't appear to be anything
> inefficient in the query.
> >
> >
> > You never talked about your data density. How many metrics are written
> every five minutes?
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Sep 27, 2016 at 7:02 AM, Carlo Vargas <[email protected]>
> wrote:
> >
> > BTW, for model 1 there will be 6 millions data points for database
> "devices" (600K data points per database/measurement).
> > For model 3, there will be 600K data points per database (each database
> has its own metric).
> >
> >
> >
> >
> >
> >
> >
> >
> > On Monday, September 26, 2016 at 8:52:19 PM UTC-4, Carlo Vargas wrote:
> >
> >
> > Currently I am evaluating different Time Series data bases and I do have
> some questions regarding data modelling and query performance in InfluxDB.
> >
> > Context: We have 200 000 devices and 10 metrics per device (for
> instance: SYSTEM_MEMORY).
> >
> >
> > Devices were processed 3 times, so we ended up with 600K data points.
> >
> > Here are the 3 models that were used:
> >
> > Model 1: One database named "devices", 10 measurements (one for each
> metric), and the tag deviceId.
> >
> >
> >
> >
> >                 Issues:
> >
> >                 - Queries by value are not responding. Example of this
> query: select * from SYSTEM_MEMORY where value > 80
> >                 - It uses a lot of RAM, the server crashes when the
> above query is executed or when the following Continuous Query is also
> executed:
> >                   ... BEGIN SELECT mean(value) as mean_value INTO
> devices."<current_policy>".:MEASUREMENT FROM devices."<new_policy>"./.*/ GROUP
> BY time(5m), deviceId
> >
> > Model 2: One database named "devices", one measurement named "data",
> deviceId tag and each metric as a field.
> >
> >
> >
> >                 Issues:
> >                 - Queries by value are slow (more than a minute).
> Example: select * from "data" where SYSTEM_MEMORY > 80
> >                 - Continuous Query takes so much time:
> >                   ... BEGIN SELECT mean(SYSTEM_MEMORY)
> as SYSTEM_MEMORY_mean INTO .... FROM  data GROUP BY time(5m), deviceId
> > Model 3:  One database per metric, one measurement "data", and deviceId
> tag.
> >
> >
> >
> >                 Issues:
> >                 - Queries by value takes around 25 seconds. Example:
> select * from "data" where value > 80 (this query is done in SYSTEM_MEMORY
> database)
> >                 - Continuous Query needs to be created for each database
> and they are slow.
> >                 - Adding data points is slower than previous two models.
> >
> >
> > Any advice/suggestion would be really appreciated.
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> > Remember to include the InfluxDB version number with all issue reports
> >
> > ---
> >
> > You received this message because you are subscribed to the Google
> Groups "InfluxDB" group.
> >
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> >
> > To post to this group, send email to [email protected].
> >
> > Visit this group at https://groups.google.com/group/influxdb.
> >
> > To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/4e0d07d0-fcda-4948-9bb0-ec8f93f703af%40googlegroups.com.
> >
> >
> >
> > For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> >
> >
> > --
> >
> >
> > Sean Beckett
> > Director of Support and Professional Services
> > InfluxDB
>
> --
> Remember to include the InfluxDB version number with all issue reports
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/bd6f8e00-281d-4988-a7d9-9883ee1efe62%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvPmOyuUFMg-2p3tqGaSZKdrTxGA5JQti%2Br3xuzJPVSY9g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Re: Data model and Continuous Query: for 200K devices and 10 metrics each

Reply via email to