Hi Sean, Thank you for your reply. Fistly, I had a miscalculation on the number of points N. N is more like 90. So if I take the first route, the number of fields would be about 90+.
> It would be quick to return queries and the field set would be small. I'm not sure why this is the case. If I always query all N points at a given time to draw the line, don't the option 1 and 2 have roughly the same performance? For example, A) SELECT * FROM "line" ORDER BY "time" DESC LIMIT 1 # with the 1st schema B) SELECT * FROM "point" GROUP BY "name" ORDER BY "time" DESC LIMIT 1 # with the 2nd schema I thought A and B scan the same number of series. Am I right? > You can submit explicit timestamps at write time, rather than letting the system determine them. Alternately, if you leave the timestamps out, then every point in the batch will get the same timestamp. True. I just feel a bit uneasy to rely on the assumption that the query B always returns all the points consist of a line. Yes, we could use batch writing to ensure all points would have the same timestamp and would be written at the same time. Whereas in the 1st schema, it is guaranteed that relevant points are bundled up in a response by construction, which is nice. But I guess this is not a big deal. > There isn't really a best practice for arrays in InfluxDB. I would start by modeling schemas 1 and 2 using the influx_stress tool to generate randomized load but with a defined schema Thank you for the pointer! I'll give it a try. Regards, Mitsutoshi 2016年10月4日(火) 12:08 Sean Beckett <[email protected]>: On Sun, Oct 2, 2016 at 8:24 PM, Mitsutoshi Aoe <[email protected]> wrote: Hi all, I'm now trying to encode a set of time-varying 2D points into an InfluxDB measurement. Suppose we write N data points (p_0 .. p_N-1) on xy-plane frequently (every second or so). N isn't large (< 20) and may occasionally change over time (e.g. every few months). The data points represents a line on the plane over time. We continuously query those data points from InfluxDB to render the line realtime or at points in time. We usually need the whole points (p_0..p_N-1) at once and never query a part of them. What the best schema for this use case? I can think of a few ideas: 1. Encode all the points as fields line p0.x=0.0,p0.y=1.0,p1.x=0.1,p1.y=0.2,... This has low series cardinality but high field cardinality. The RAM needs of the system would be fairly low, and because each field is densely populated it would compress and query fairly well. There can be performance issues querying many fields at once, but since the field count is less than 40 and they are all floats, it might be okay depending on your query frequency. 2. Use a tag to distinguish points point name=p0 x=0.0,y=1.0 point name=p1 x=0.1,y=0.2 This would potentially lead to high series cardinality, unless the point names don't change over time. It would be quick to return queries and the field set would be small. I don't think we have performance modeling for the tradeoffs between tags and fields at 40+, but this is the schema I would start with, other considerations aside. 3. Serialize all the points as a string line value="[(0.0,1.0),(0.1,0.2)]" It's not an efficient format but just to sketch the idea. This would be storing long strings, which is not the best for compressibility or RAM usage. There are also no string functions in InfluxDB like substr or find, so you would always have to return the entire line and work with that. 1 looks good. I'm somehow uncomfortable with using fields names to distinguish points though. I feel better with 2 in this regard. But the problem with 2 is that reconstructing the line from the points are unnecessarily complicated: 2-A. Each point in the same line can have different timestamps. Whereas 1 guarantees that all points in the same line have the same timestamp. You can submit explicit timestamps at write time, rather than letting the system determine them. Alternately, if you leave the timestamps out, then every point in the batch will get the same timestamp. As long as points on lines are all in the same batch they will all have the same timestamp. 2-B. How much data points do we need to query to draw the current line? There's no guarantee that fetching N data points covers all data points that are necessary to reconstruct the line. This would require careful batching when writing, or using another tag to differentiate the lines from each other. 3 looks terrible in terms of space efficiency. But it might be easiest to reconstruct the line if you have a handy text parser. It would be ideal if I could just store an array of numbers as a field value in InfluxDB. But currently there seems to be no such feature. What's the current best practice? There isn't really a best practice for arrays in InfluxDB. I would start by modeling schemas 1 and 2 using the influx_stress <https://github.com/influxdata/influxdb/tree/master/stress/v2> tool to generate randomized load but with a defined schema. Thanks, Mitsutoshi -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com <https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com?utm_medium=email&utm_source=footer> . For more options, visit https://groups.google.com/d/optout. -- Sean Beckett Director of Support and Professional Services InfluxDB -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com <https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com?utm_medium=email&utm_source=footer> . For more options, visit https://groups.google.com/d/optout. -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CAMLnt0OSu5B7NCurh9M3ufJYLW5J9EabfHagbRtLC2Ozv2aeAQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
