On 2016.06.21. 1:22, Sean Beckett wrote:


On Mon, Jun 20, 2016 at 7:25 AM, Zoltan Szalai <[email protected] <mailto:[email protected]>> wrote:

    Hello,

    I would like to ask schema design advise for the following use case:

    I want to collect time series data of trips of cars.
    I collect about 20-30 different metrics per trip (like speed, rpm,
    location, temperature etc.) with different frequencies ranging
    from 3 seconds to 1 minute.
    A car generates about 150-300 trips in a month, 2000-3000 trip a year.

    I have some data about the trips stored in an rdbms like trip_id,
    car, driver, start datetime, end datetime and many more and I want
    to store only the raw, collected time series data in influxdb.
    Among the raw data collected from the cars I have to calculate
    additional time series data derived from the collected data, like
    fuel consumption and also single values like distance, average
    speed etc. typically when a trip ends.

    I had/have the following ideas in my mind:
    1. My first naive approach was to just simple store every trip's
    data in its own measurement (trip_<trip_id>) without any tags but
    that would generate a huge amount of series sooner or later. Since
    then I've learned that series cardinality matters the most in
    terms of memory usage and performance so it's obviously not a good
    choice. So any option that would use the trip ids as tag values
    sounds like a really bad idea because I'll have a lot of trips.


Encoding metadata in the measurement name is a bad practice. You should have dozens, maybe hundreds of measurements. More than that is a schema smell.


Actually in this case, eventually I'd have as many series / measurements as trips which will be millions. So yeah, not a good idea.


    2. Store all the data in a single measurement and use for example
    the car id as a tag. That would generate giant amount of points in
    a single measurement (is it a problem?) and would make horizontal
    partitioning of the data difficult if I'd need to use multiple
    hosts / databases later.


2-3k trips per year per car. Are you monitoring more than a few hundred cars? InfluxDB is fine up to about a million series, and then things get interesting.


Not yet, but the number of cars will grow to thousands or perphaps tens of thousands (or more) in time. At least, that's the plan. :)


Why not shard the cars into separate instances? That way the series cardinality can be divided up amongst multiple InfluxDB processes. If you need to query across cars this isn't as useful, but if each car is a separate consumer it seems reasonable.

Basically, that is what I'm suggesting in 3.


    3. Measurement / per car, no tags. This seems like the best
    approach for me at the moment. Horizontal partitioning would be
    relatively trivial using car id as shard key if the number of cars
    on the system increases. Series cardinality would equal with car
    cardinality. Obviously in order to get only the data of a single
    trip (which is important) I'd have to rely on the meta data (the
    trip's start and end time) stored in the rdbms. That is true for
    2. as well.


Too many measurements, and with no tags you're losing 50%+ of the functionality of the query language.


I'd have as many measurements (and thereby series) as cars in the system. And not necessarily in one database. I thought only the series cardinality counts, not the measurement cardinality. What impacts series cardinality the most is the number of different tag values if I understand correctly. I am not sure how I could utilize tags at his point but maybe I'm missing something.


    I'm open to new ideas as well. What do you think?


Start with your queries, work backwards from there. What questions does the schema need to support?


The most important one is: give me all the data that belongs to a single trip. (so I can make some calculations from them and generate other series).
for 1. this would be select * from trip_<trip_id>
for 2. select * from the_one_giant_measurement where car_id = <trips_car_id> and time > <trip_start> and time < <trip_end> for 3. select * from car_<car_id> where time > <trip_start> and time < <trip_end>


Thanks for your time, Sean!


    Thanks
    Zoltan

-- Remember to include the InfluxDB version number with all issue reports
    --- You received this message because you are subscribed to the
    Google Groups "InfluxDB" group.
    To unsubscribe from this group and stop receiving emails from it,
    send an email to [email protected]
    <mailto:influxdb%[email protected]>.
    To post to this group, send email to [email protected]
    <mailto:[email protected]>.
    Visit this group at https://groups.google.com/group/influxdb.
    To view this discussion on the web visit
    
https://groups.google.com/d/msgid/influxdb/f5bb0779-f2ac-d9f3-7078-9709d6369714%40gmail.com.
    For more options, visit https://groups.google.com/d/optout.




--
Sean Beckett
Director of Support and Professional Services
InfluxDB
--
Remember to include the InfluxDB version number with all issue reports
---
You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>. To post to this group, send email to [email protected] <mailto:[email protected]>.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvMrxAEUzCVPwuG12aaJkv6Y%3DRKL1pkz%3DMqrLO1NVszwDA%40mail.gmail.com <https://groups.google.com/d/msgid/influxdb/CALGqCvMrxAEUzCVPwuG12aaJkv6Y%3DRKL1pkz%3DMqrLO1NVszwDA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.


--
Remember to include the InfluxDB version number with all issue reports
--- You received this message because you are subscribed to the Google Groups "InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/d050ba12-d7b9-6f27-b084-fc2d979b9a78%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to