Mathias, You are of course correct that it is possible to imagine joining any given measurement. I should not have made absolute statements. You asked why we have separate measurements and I explained that they are logical containers to assist in organization.
We are also keenly aware that many users want to JOIN across measurements. It is a challenging feature to say the least, and not on the short term roadmap. Many thousands of users are getting value from InfluxDB without JOINs, so clearly it's not a crippling decision to architect the system this way for the time being. On Tue, Jun 21, 2016 at 2:57 PM, Mathias Herberts < [email protected]> wrote: > You could want to correlate cq stats with other write stats, so somehow > joining them may make perfect sense. > > You should never consider that two measurements will never be joined, this > is othogonal to what data science really is, i.e. exploration in many > different ways to discover patterns or trends. > > On Tuesday, June 21, 2016 at 9:47:37 PM UTC+2, Sean Beckett wrote: >> >> Measurements are a way to segregate related field sets. You can put >> everything into one measurement, but it makes things more confusing. You >> can also put everything in one database, but why do that if they are >> unrelated. >> >> Look at the _internal database for an example of why separate >> measurements makes sense. Why store "write" stats in the same place as "cq" >> stats, or alongside "runtime" information? They aren't related, you'd never >> JOIN them, so might as well keep them logically separate for clarity. >> >> On Tue, Jun 21, 2016 at 1:43 PM, Mathias Herberts <[email protected]> >> wrote: >> >>> One thing I really don't understand is why you have the notion of >>> measurement in the first place. Since fields from different measurements >>> cannot be joined, it would be safer to only have a single measurement, just >>> in case any join should be done in the future. >>> >>> >>> On Tuesday, June 21, 2016 at 6:20:59 PM UTC+2, Sean Beckett wrote: >>>> >>>> >>>> >>>> On Mon, Jun 20, 2016 at 5:56 PM, Zoltan Szalai <[email protected]> >>>> wrote: >>>> >>>>> On 2016.06.21. 1:22, Sean Beckett wrote: >>>>> >>>>> >>>>> >>>>> On Mon, Jun 20, 2016 at 7:25 AM, Zoltan Szalai <[email protected]> >>>>> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I would like to ask schema design advise for the following use case: >>>>>> >>>>>> I want to collect time series data of trips of cars. >>>>>> I collect about 20-30 different metrics per trip (like speed, rpm, >>>>>> location, temperature etc.) with different frequencies ranging from 3 >>>>>> seconds to 1 minute. >>>>>> A car generates about 150-300 trips in a month, 2000-3000 trip a year. >>>>>> >>>>>> I have some data about the trips stored in an rdbms like trip_id, >>>>>> car, driver, start datetime, end datetime and many more and I want to >>>>>> store >>>>>> only the raw, collected time series data in influxdb. Among the raw data >>>>>> collected from the cars I have to calculate additional time series data >>>>>> derived from the collected data, like fuel consumption and also single >>>>>> values like distance, average speed etc. typically when a trip ends. >>>>>> >>>>>> I had/have the following ideas in my mind: >>>>>> 1. My first naive approach was to just simple store every trip's data >>>>>> in its own measurement (trip_<trip_id>) without any tags but that would >>>>>> generate a huge amount of series sooner or later. Since then I've learned >>>>>> that series cardinality matters the most in terms of memory usage and >>>>>> performance so it's obviously not a good choice. So any option that would >>>>>> use the trip ids as tag values sounds like a really bad idea because I'll >>>>>> have a lot of trips. >>>>>> >>>>> >>>>> Encoding metadata in the measurement name is a bad practice. You >>>>> should have dozens, maybe hundreds of measurements. More than that is a >>>>> schema smell. >>>>> >>>>> >>>>> >>>>> Actually in this case, eventually I'd have as many series / >>>>> measurements as trips which will be millions. So yeah, not a good idea. >>>>> >>>> >>>> A series is in no way equivalent to a measurement. Each measurement is >>>> a logical container of perhaps thousands or hundreds of thousands of >>>> series. Each series in a measurement is joined by default for query >>>> results. Series in two different measurements cannot be joined at all. >>>> >>>> You wouldn't have a measurement for each trip. I think you should have >>>> only one measurement for all trips. You would have a series for each trip, >>>> however. >>>> >>>> Total series count is what matters most for RAM usage. >>>> >>>> >>>> >>>> >>>>> >>>>> >>>>> >>>>> 2. Store all the data in a single measurement and use for example the >>>>>> car id as a tag. That would generate giant amount of points in a single >>>>>> measurement (is it a problem?) and would make horizontal partitioning of >>>>>> the data difficult if I'd need to use multiple hosts / databases later. >>>>>> >>>>> >>>>> 2-3k trips per year per car. Are you monitoring more than a few >>>>> hundred cars? InfluxDB is fine up to about a million series, and then >>>>> things get interesting. >>>>> >>>>> >>>>> Not yet, but the number of cars will grow to thousands or perphaps >>>>> tens of thousands (or more) in time. At least, that's the plan. :) >>>>> >>>>> >>>>> Why not shard the cars into separate instances? That way the series >>>>> cardinality can be divided up amongst multiple InfluxDB processes. If you >>>>> need to query across cars this isn't as useful, but if each car is a >>>>> separate consumer it seems reasonable. >>>>> >>>>> >>>>> Basically, that is what I'm suggesting in 3. >>>>> >>>> >>>> No, the series cardinality concerns are per instance, not per database >>>> or per measurement. The entire InfluxDB server shares series cardinality. >>>> The only way to distribute the cardinality load is to put the series on >>>> different InfluxDB servers, whether different nodes in a cluster or >>>> different instances entirely. >>>> >>>> >>>>> >>>>> >>>>> >>>>>> 3. Measurement / per car, no tags. This seems like the best approach >>>>>> for me at the moment. Horizontal partitioning would be relatively trivial >>>>>> using car id as shard key if the number of cars on the system increases. >>>>>> Series cardinality would equal with car cardinality. Obviously in order >>>>>> to >>>>>> get only the data of a single trip (which is important) I'd have to rely >>>>>> on >>>>>> the meta data (the trip's start and end time) stored in the rdbms. That >>>>>> is >>>>>> true for 2. as well. >>>>>> >>>>> >>>>> Too many measurements, and with no tags you're losing 50%+ of the >>>>> functionality of the query language. >>>>> >>>>> >>>>> I'd have as many measurements (and thereby series) as cars in the >>>>> system. And not necessarily in one database. >>>>> I thought only the series cardinality counts, not the measurement >>>>> cardinality. >>>>> >>>> >>>> Measurements are containers of series, they do not exist independent of >>>> the series they contain. Think of them as mini-databases within the >>>> database. Every measurement has at least one series. >>>> >>>> >>>>> What impacts series cardinality the most is the number of different >>>>> tag values if I understand correctly. >>>>> >>>> >>>> The number of actual combinations of tag values. See >>>> https://groups.google.com/d/msgid/influxdb/23486fd5-4f40-4ae7-9573-a09eb55e6683%40googlegroups.com >>>> for more context. >>>> >>>> >>>>> I am not sure how I could utilize tags at his point but maybe I'm >>>>> missing something. >>>>> >>>>> >>>>> I'm open to new ideas as well. What do you think? >>>>>> >>>>> >>>>> Start with your queries, work backwards from there. What questions >>>>> does the schema need to support? >>>>> >>>>> >>>>> >>>>> >>>>> The most important one is: give me all the data that belongs to a >>>>> single trip. (so I can make some calculations from them and generate other >>>>> series). >>>>> for 1. this would be select * from trip_<trip_id> >>>>> for 2. select * from the_one_giant_measurement where car_id = >>>>> <trips_car_id> and time > <trip_start> and time < <trip_end> >>>>> for 3. select * from car_<car_id> where time > <trip_start> and time < >>>>> <trip_end> >>>>> >>>> >>>> I think you want a schema that uses trip_id as a tag, and then you >>>> shard the trips across multiple instances when cardinality becomes a >>>> concern. >>>> >>>> SELECT * FROM trips WHERE trip_id = '1234' >>>> >>>> >>>> >>>>> >>>>> >>>>> Thanks for your time, Sean! >>>>> >>>>> >>>>>> Thanks >>>>>> >>>>>> Zoltan >>>>>> >>>>>> -- >>>>>> Remember to include the InfluxDB version number with all issue reports >>>>>> --- You received this message because you are subscribed to the >>>>>> Google Groups "InfluxDB" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> Visit this group at https://groups.google.com/group/influxdb. >>>>>> To view this discussion on the web visit >>>>>> <https://groups.google.com/d/msgid/influxdb/f5bb0779-f2ac-d9f3-7078-9709d6369714%40gmail.com> >>>>>> https://groups.google.com/d/msgid/influxdb/f5bb0779-f2ac-d9f3-7078-9709d6369714%40gmail.com >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sean Beckett >>>>> Director of Support and Professional Services >>>>> InfluxDB >>>>> -- >>>>> Remember to include the InfluxDB version number with all issue reports >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "InfluxDB" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/influxdb. >>>>> To view this discussion on the web visit >>>>> <https://groups.google.com/d/msgid/influxdb/CALGqCvMrxAEUzCVPwuG12aaJkv6Y%3DRKL1pkz%3DMqrLO1NVszwDA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> https://groups.google.com/d/msgid/influxdb/CALGqCvMrxAEUzCVPwuG12aaJkv6Y%3DRKL1pkz%3DMqrLO1NVszwDA%40mail.gmail.com >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> -- >>>>> Remember to include the InfluxDB version number with all issue reports >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "InfluxDB" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/influxdb. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/influxdb/d050ba12-d7b9-6f27-b084-fc2d979b9a78%40gmail.com >>>>> <https://groups.google.com/d/msgid/influxdb/d050ba12-d7b9-6f27-b084-fc2d979b9a78%40gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> >>>> -- >>>> Sean Beckett >>>> Director of Support and Professional Services >>>> InfluxDB >>>> >>> -- >>> Remember to include the InfluxDB version number with all issue reports >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "InfluxDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/influxdb. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/influxdb/02e7516f-e3a4-44b1-a928-d974ce66e2b8%40googlegroups.com >>> <https://groups.google.com/d/msgid/influxdb/02e7516f-e3a4-44b1-a928-d974ce66e2b8%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Sean Beckett >> Director of Support and Professional Services >> InfluxDB >> > -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/14fbb3e3-1c3b-4f12-8416-6d7653ab984e%40googlegroups.com > <https://groups.google.com/d/msgid/influxdb/14fbb3e3-1c3b-4f12-8416-6d7653ab984e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Sean Beckett Director of Support and Professional Services InfluxDB -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvOTmNrQnrVHf5FZPDe0xOZ8vLSd3uZWJmgreY2xFuW60g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
