On Sun, Oct 2, 2016 at 8:24 PM, Mitsutoshi Aoe <[email protected]> wrote:
> Hi all, > > I'm now trying to encode a set of time-varying 2D points into an InfluxDB > measurement. > > Suppose we write N data points (p_0 .. p_N-1) on xy-plane frequently > (every second or so). N isn't large (< 20) and may occasionally change over > time (e.g. every few months). The data points represents a line on the > plane over time. We continuously query those data points from InfluxDB to > render the line realtime or at points in time. We usually need the whole > points (p_0..p_N-1) at once and never query a part of them. > > What the best schema for this use case? I can think of a few ideas: > > 1. Encode all the points as fields > > line p0.x=0.0,p0.y=1.0,p1.x=0.1,p1.y=0.2,... > > This has low series cardinality but high field cardinality. The RAM needs of the system would be fairly low, and because each field is densely populated it would compress and query fairly well. There can be performance issues querying many fields at once, but since the field count is less than 40 and they are all floats, it might be okay depending on your query frequency. > 2. Use a tag to distinguish points > > point name=p0 x=0.0,y=1.0 > point name=p1 x=0.1,y=0.2 > > This would potentially lead to high series cardinality, unless the point names don't change over time. It would be quick to return queries and the field set would be small. I don't think we have performance modeling for the tradeoffs between tags and fields at 40+, but this is the schema I would start with, other considerations aside. > 3. Serialize all the points as a string > > line value="[(0.0,1.0),(0.1,0.2)]" > It's not an efficient format but just to sketch the idea. > This would be storing long strings, which is not the best for compressibility or RAM usage. There are also no string functions in InfluxDB like substr or find, so you would always have to return the entire line and work with that. > > 1 looks good. I'm somehow uncomfortable with using fields names to > distinguish points though. I feel better with 2 in this regard. But the > problem with 2 is that reconstructing the line from the points are > unnecessarily complicated: > > 2-A. Each point in the same line can have different timestamps. Whereas 1 > guarantees that all points in the same line have the same timestamp. > You can submit explicit timestamps at write time, rather than letting the system determine them. Alternately, if you leave the timestamps out, then every point in the batch will get the same timestamp. As long as points on lines are all in the same batch they will all have the same timestamp. > 2-B. How much data points do we need to query to draw the current line? > There's no guarantee that fetching N data points covers all data points > that are necessary to reconstruct the line. > This would require careful batching when writing, or using another tag to differentiate the lines from each other. > 3 looks terrible in terms of space efficiency. But it might be easiest to > reconstruct the line if you have a handy text parser. > > It would be ideal if I could just store an array of numbers as a field > value in InfluxDB. But currently there seems to be no such feature. What's > the current best practice? > There isn't really a best practice for arrays in InfluxDB. I would start by modeling schemas 1 and 2 using the influx_stress <https://github.com/influxdata/influxdb/tree/master/stress/v2> tool to generate randomized load but with a defined schema. > > > Thanks, > Mitsutoshi > > -- > Remember to include the InfluxDB version number with all issue reports > --- > You received this message because you are subscribed to the Google Groups > "InfluxDB" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com > <https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Sean Beckett Director of Support and Professional Services InfluxDB -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
