Re: [influxdb] schema design advice

Sean Beckett Mon, 20 Jun 2016 16:23:45 -0700

On Mon, Jun 20, 2016 at 7:25 AM, Zoltan Szalai <[email protected]>
wrote:


> Hello,
>
> I would like to ask schema design advise for the following use case:
>
> I want to collect time series data of trips of cars.
> I collect about 20-30 different metrics per trip (like speed, rpm,
> location, temperature etc.) with different frequencies ranging from 3
> seconds to 1 minute.
> A car generates about 150-300 trips in a month, 2000-3000 trip a year.
>
> I have some data about the trips stored in an rdbms like trip_id, car,
> driver, start datetime, end datetime and many more and I want to store only
> the raw, collected time series data in influxdb. Among the raw data
> collected from the cars I have to calculate additional time series data
> derived from the collected data, like fuel consumption and also single
> values like distance, average speed etc. typically when a trip ends.
>
> I had/have the following ideas in my mind:
> 1. My first naive approach was to just simple store every trip's data in
> its own measurement (trip_<trip_id>) without any tags but that would
> generate a huge amount of series sooner or later. Since then I've learned
> that series cardinality matters the most in terms of memory usage and
> performance so it's obviously not a good choice. So any option that would
> use the trip ids as tag values sounds like a really bad idea because I'll
> have a lot of trips.
>

Encoding metadata in the measurement name is a bad practice. You should
have dozens, maybe hundreds of measurements. More than that is a schema
smell.


> 2. Store all the data in a single measurement and use for example the car
> id as a tag. That would generate giant amount of points in a single
> measurement (is it a problem?) and would make horizontal partitioning of
> the data difficult if I'd need to use multiple hosts / databases later.
>

2-3k trips per year per car. Are you monitoring more than a few hundred
cars? InfluxDB is fine up to about a million series, and then things get
interesting.

Why not shard the cars into separate instances? That way the series
cardinality can be divided up amongst multiple InfluxDB processes. If you
need to query across cars this isn't as useful, but if each car is a
separate consumer it seems reasonable.


> 3. Measurement / per car, no tags. This seems like the best approach for
> me at the moment. Horizontal partitioning would be relatively trivial using
> car id as shard key if the number of cars on the system increases. Series
> cardinality would equal with car cardinality. Obviously in order to get
> only the data of a single trip (which is important) I'd have to rely on the
> meta data (the trip's start and end time) stored in the rdbms. That is true
> for 2. as well.
>

Too many measurements, and with no tags you're losing 50%+ of the
functionality of the query language.

I'm open to new ideas as well. What do you think?
>

Start with your queries, work backwards from there. What questions does the
schema need to support?


>
> Thanks
> Zoltan
>
> --
> Remember to include the InfluxDB version number with all issue reports
> --- You received this message because you are subscribed to the Google
> Groups "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/influxdb/f5bb0779-f2ac-d9f3-7078-9709d6369714%40gmail.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvMrxAEUzCVPwuG12aaJkv6Y%3DRKL1pkz%3DMqrLO1NVszwDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] schema design advice

Reply via email to