Sorry, I had a miscalculation again! > Fistly, I had a miscalculation on the number of points N. N is more like 90. So if I take the first route, the number of fields would be about 90+.
I meant to say the number of fields would be about 180+. Mitsutoshi 2016年10月4日(火) 13:32 Mitsutoshi Aoe <[email protected]>: > Hi Sean, > Thank you for your reply. > > Fistly, I had a miscalculation on the number of points N. N is more like > 90. So if I take the first route, the number of fields would be about 90+. > > > > It would be quick to return queries and the field set would be small. > > I'm not sure why this is the case. If I always query all N points at a > given time to draw the line, don't the option 1 and 2 have roughly the same > performance? > > For example, > > A) SELECT * FROM "line" ORDER BY "time" DESC LIMIT 1 # with the 1st schema > B) SELECT * FROM "point" GROUP BY "name" ORDER BY "time" DESC LIMIT 1 # > with the 2nd schema > > I thought A and B scan the same number of series. Am I right? > > > You can submit explicit timestamps at write time, rather than letting > the system determine them. Alternately, if you leave the timestamps out, > then every point in the batch will get the same timestamp. > > True. I just feel a bit uneasy to rely on the assumption that the query B > always returns all the points consist of a line. Yes, we could use batch > writing to ensure all points would have the same timestamp and would be > written at the same time. Whereas in the 1st schema, it is guaranteed that > relevant points are bundled up in a response by construction, which is > nice. But I guess this is not a big deal. > > > There isn't really a best practice for arrays in InfluxDB. I would start > by modeling schemas 1 and 2 using the influx_stress tool to generate > randomized load but with a defined schema > > Thank you for the pointer! I'll give it a try. > > Regards, > Mitsutoshi > > 2016年10月4日(火) 12:08 Sean Beckett <[email protected]>: > > On Sun, Oct 2, 2016 at 8:24 PM, Mitsutoshi Aoe <[email protected]> wrote: > > Hi all, > > I'm now trying to encode a set of time-varying 2D points into an InfluxDB > measurement. > > Suppose we write N data points (p_0 .. p_N-1) on xy-plane frequently > (every second or so). N isn't large (< 20) and may occasionally change over > time (e.g. every few months). The data points represents a line on the > plane over time. We continuously query those data points from InfluxDB to > render the line realtime or at points in time. We usually need the whole > points (p_0..p_N-1) at once and never query a part of them. > > What the best schema for this use case? I can think of a few ideas: > > 1. Encode all the points as fields > > line p0.x=0.0,p0.y=1.0,p1.x=0.1,p1.y=0.2,... > > > This has low series cardinality but high field cardinality. The RAM needs > of the system would be fairly low, and because each field is densely > populated it would compress and query fairly well. There can be performance > issues querying many fields at once, but since the field count is less than > 40 and they are all floats, it might be okay depending on your query > frequency. > > > 2. Use a tag to distinguish points > > point name=p0 x=0.0,y=1.0 > point name=p1 x=0.1,y=0.2 > > > This would potentially lead to high series cardinality, unless the point > names don't change over time. It would be quick to return queries and the > field set would be small. I don't think we have performance modeling for > the tradeoffs between tags and fields at 40+, but this is the schema I > would start with, other considerations aside. > > > 3. Serialize all the points as a string > > line value="[(0.0,1.0),(0.1,0.2)]" > It's not an efficient format but just to sketch the idea. > > > This would be storing long strings, which is not the best for > compressibility or RAM usage. There are also no string functions in > InfluxDB like substr or find, so you would always have to return the entire > line and work with that. > > > > 1 looks good. I'm somehow uncomfortable with using fields names to > distinguish points though. I feel better with 2 in this regard. But the > problem with 2 is that reconstructing the line from the points are > unnecessarily complicated: > > 2-A. Each point in the same line can have different timestamps. Whereas 1 > guarantees that all points in the same line have the same timestamp. > > > You can submit explicit timestamps at write time, rather than letting the > system determine them. Alternately, if you leave the timestamps out, then > every point in the batch will get the same timestamp. As long as points on > lines are all in the same batch they will all have the same timestamp. > > > 2-B. How much data points do we need to query to draw the current line? > There's no guarantee that fetching N data points covers all data points > that are necessary to reconstruct the line. > > > This would require careful batching when writing, or using another tag to > differentiate the lines from each other. > > > 3 looks terrible in terms of space efficiency. But it might be easiest to > reconstruct the line if you have a handy text parser. > > It would be ideal if I could just store an array of numbers as a field > value in InfluxDB. But currently there seems to be no such feature. What's > the current best practice? > > > There isn't really a best practice for arrays in InfluxDB. I would start > by modeling schemas 1 and 2 using the influx_stress > <https://github.com/influxdata/influxdb/tree/master/stress/v2> tool to > generate randomized load but with a defined schema. > > > > > Thanks, > Mitsutoshi > > -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com > <https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > > > -- > Sean Beckett > Director of Support and Professional Services > InfluxDB > > -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit > https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com > <https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CAMLnt0O9T551FMWY_xxan8d%2BLffto614i0SrOPpBfEYEZq%2BH4g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
