With 180 fields, querying them all together might be RAM intensive. I would definitely recommend using the stress tool to model that schema and the query resource needs.
>> It would be quick to return queries and the field set would be small. > I'm not sure why this is the case. If I always query all N points at a given time to draw the line, don't the option 1 and 2 have roughly the same performance? > I thought A and B scan the same number of series. Am I right? Yes and no. We're currently updating the docs to rename what we currently call a series to the "metaseries". A metaseries is the combination of measurement + tagset. A series is how the data is actually stored in the TSM files, and that's a measurement + tagset + field. Option 1 has ~180 series per metaseries. Option 2 has two series per metaseries. The performance won't be identical, but I hesitate to guess how they would differ. > For example, > A) SELECT * FROM "line" ORDER BY "time" DESC LIMIT 1 # with the 1st schema A couple notes. First, ALWAYS use a "WHERE time" clause with a "SELECT *", or else you can easily OOM the machine. The LIMIT clause doesn't yet restrict the results queried, just the results returned. See https://github.com/influxdata/influxdb/issues/7182 for more.) > B) SELECT * FROM "point" GROUP BY "name" ORDER BY "time" DESC LIMIT 1 # with the 2nd schema This would return all the fields, but it might be clearer to use a "GROUP BY *" clause to separate all the points into their own x,y buckets. On Mon, Oct 3, 2016 at 10:34 PM, Mitsutoshi Aoe <[email protected]> wrote: > Sorry, I had a miscalculation again! > > > Fistly, I had a miscalculation on the number of points N. N is more like > 90. So if I take the first route, the number of fields would be about 90+. > > I meant to say the number of fields would be about 180+. > > Mitsutoshi > > 2016年10月4日(火) 13:32 Mitsutoshi Aoe <[email protected]>: > >> Hi Sean, >> Thank you for your reply. >> >> Fistly, I had a miscalculation on the number of points N. N is more like >> 90. So if I take the first route, the number of fields would be about 90+. >> >> >> > It would be quick to return queries and the field set would be small. >> >> I'm not sure why this is the case. If I always query all N points at a >> given time to draw the line, don't the option 1 and 2 have roughly the same >> performance? >> >> For example, >> >> A) SELECT * FROM "line" ORDER BY "time" DESC LIMIT 1 # with the 1st schema >> B) SELECT * FROM "point" GROUP BY "name" ORDER BY "time" DESC LIMIT 1 # >> with the 2nd schema >> >> I thought A and B scan the same number of series. Am I right? >> >> > You can submit explicit timestamps at write time, rather than letting >> the system determine them. Alternately, if you leave the timestamps out, >> then every point in the batch will get the same timestamp. >> >> True. I just feel a bit uneasy to rely on the assumption that the query B >> always returns all the points consist of a line. Yes, we could use batch >> writing to ensure all points would have the same timestamp and would be >> written at the same time. Whereas in the 1st schema, it is guaranteed that >> relevant points are bundled up in a response by construction, which is >> nice. But I guess this is not a big deal. >> >> > There isn't really a best practice for arrays in InfluxDB. I would >> start by modeling schemas 1 and 2 using the influx_stress tool to generate >> randomized load but with a defined schema >> >> Thank you for the pointer! I'll give it a try. >> >> Regards, >> Mitsutoshi >> >> 2016年10月4日(火) 12:08 Sean Beckett <[email protected]>: >> >>> On Sun, Oct 2, 2016 at 8:24 PM, Mitsutoshi Aoe <[email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I'm now trying to encode a set of time-varying 2D points into >>>> an InfluxDB measurement. >>>> >>>> Suppose we write N data points (p_0 .. p_N-1) on xy-plane frequently >>>> (every second or so). N isn't large (< 20) and may occasionally change over >>>> time (e.g. every few months). The data points represents a line on the >>>> plane over time. We continuously query those data points from InfluxDB to >>>> render the line realtime or at points in time. We usually need the whole >>>> points (p_0..p_N-1) at once and never query a part of them. >>>> >>>> What the best schema for this use case? I can think of a few ideas: >>>> >>>> 1. Encode all the points as fields >>>> >>>> line p0.x=0.0,p0.y=1.0,p1.x=0.1,p1.y=0.2,... >>>> >>>> >>> This has low series cardinality but high field cardinality. The RAM >>> needs of the system would be fairly low, and because each field is densely >>> populated it would compress and query fairly well. There can be performance >>> issues querying many fields at once, but since the field count is less than >>> 40 and they are all floats, it might be okay depending on your query >>> frequency. >>> >>> >>>> 2. Use a tag to distinguish points >>>> >>>> point name=p0 x=0.0,y=1.0 >>>> point name=p1 x=0.1,y=0.2 >>>> >>>> >>> This would potentially lead to high series cardinality, unless the point >>> names don't change over time. It would be quick to return queries and the >>> field set would be small. I don't think we have performance modeling for >>> the tradeoffs between tags and fields at 40+, but this is the schema I >>> would start with, other considerations aside. >>> >>> >>>> 3. Serialize all the points as a string >>>> >>>> line value="[(0.0,1.0),(0.1,0.2)]" >>>> It's not an efficient format but just to sketch the idea. >>>> >>> >>> This would be storing long strings, which is not the best for >>> compressibility or RAM usage. There are also no string functions in >>> InfluxDB like substr or find, so you would always have to return the entire >>> line and work with that. >>> >>> >>>> >>>> 1 looks good. I'm somehow uncomfortable with using fields names to >>>> distinguish points though. I feel better with 2 in this regard. But the >>>> problem with 2 is that reconstructing the line from the points are >>>> unnecessarily complicated: >>>> >>>> 2-A. Each point in the same line can have different timestamps. Whereas >>>> 1 guarantees that all points in the same line have the same timestamp. >>>> >>> >>> You can submit explicit timestamps at write time, rather than letting >>> the system determine them. Alternately, if you leave the timestamps out, >>> then every point in the batch will get the same timestamp. As long as >>> points on lines are all in the same batch they will all have the same >>> timestamp. >>> >>> >>>> 2-B. How much data points do we need to query to draw the current line? >>>> There's no guarantee that fetching N data points covers all data points >>>> that are necessary to reconstruct the line. >>>> >>> >>> This would require careful batching when writing, or using another tag >>> to differentiate the lines from each other. >>> >>> >>>> 3 looks terrible in terms of space efficiency. But it might be easiest >>>> to reconstruct the line if you have a handy text parser. >>>> >>>> It would be ideal if I could just store an array of numbers as a field >>>> value in InfluxDB. But currently there seems to be no such feature. What's >>>> the current best practice? >>>> >>> >>> There isn't really a best practice for arrays in InfluxDB. I would start >>> by modeling schemas 1 and 2 using the influx_stress >>> <https://github.com/influxdata/influxdb/tree/master/stress/v2> tool to >>> generate randomized load but with a defined schema. >>> >>> >>>> >>>> >>>> Thanks, >>>> Mitsutoshi >>>> >>>> -- >>>> Remember to include the InfluxDB version number with all issue reports >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "InfluxDB" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> Visit this group at https://groups.google.com/group/influxdb. >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com >>>> <https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Sean Beckett >>> Director of Support and Professional Services >>> InfluxDB >>> >>> -- >>> Remember to include the InfluxDB version number with all issue reports >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "InfluxDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/influxdb. >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_ >>> pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com >>> <https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/CAMLnt0O9T551FMWY_xxan8d%2BLffto614i0SrOPpBfEYEZq% > 2BH4g%40mail.gmail.com > <https://groups.google.com/d/msgid/influxdb/CAMLnt0O9T551FMWY_xxan8d%2BLffto614i0SrOPpBfEYEZq%2BH4g%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Sean Beckett Director of Support and Professional Services InfluxDB -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvNZ%2Bk%2BxmXOgiiK%3DMHrsuh5EMErjoxtbv6cdZeTNKCW-uQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
