Re: [influxdb] Schema design for arrays of points

Mitsutoshi Aoe Mon, 03 Oct 2016 21:35:45 -0700

Sorry, I had a miscalculation again!

> Fistly, I had a miscalculation on the number of points N. N is more like
90. So if I take the first route, the number of fields would be about 90+.


I meant to say the number of fields would be about 180+.

Mitsutoshi

2016年10月4日(火) 13:32 Mitsutoshi Aoe <[email protected]>:

> Hi Sean,
> Thank you for your reply.
>
> Fistly, I had a miscalculation on the number of points N. N is more like
> 90. So if I take the first route, the number of fields would be about 90+.
>
>
> > It would be quick to return queries and the field set would be small.
>
> I'm not sure why this is the case. If I always query all N points at a
> given time to draw the line, don't the option 1 and 2 have roughly the same
> performance?
>
> For example,
>
> A) SELECT * FROM "line" ORDER BY "time" DESC LIMIT 1 # with the 1st schema
> B) SELECT * FROM "point" GROUP BY "name" ORDER BY "time" DESC LIMIT 1 #
> with the 2nd schema
>
> I thought A and B scan the same number of series. Am I right?
>
> > You can submit explicit timestamps at write time, rather than letting
> the system determine them. Alternately, if you leave the timestamps out,
> then every point in the batch will get the same timestamp.
>
> True. I just feel a bit uneasy to rely on the assumption that the query B
> always returns all the points consist of a line. Yes, we could use batch
> writing to ensure all points would have the same timestamp and would be
> written at the same time. Whereas in the 1st schema, it is guaranteed that
> relevant points are bundled up in a response by construction, which is
> nice. But I guess this is not a big deal.
>
> > There isn't really a best practice for arrays in InfluxDB. I would start
> by modeling schemas 1 and 2 using the influx_stress tool to generate
> randomized load but with a defined schema
>
> Thank you for the pointer! I'll give it a try.
>
> Regards,
> Mitsutoshi
>
> 2016年10月4日(火) 12:08 Sean Beckett <[email protected]>:
>
> On Sun, Oct 2, 2016 at 8:24 PM, Mitsutoshi Aoe <[email protected]> wrote:
>
> Hi all,
>
> I'm now trying to encode a set of time-varying 2D points into an InfluxDB
> measurement.
>
> Suppose we write N data points (p_0 .. p_N-1) on xy-plane frequently
> (every second or so). N isn't large (< 20) and may occasionally change over
> time (e.g. every few months). The data points represents a line on the
> plane over time. We continuously query those data points from InfluxDB to
> render the line realtime or at points in time. We usually need the whole
> points (p_0..p_N-1) at once and never query a part of them.
>
> What the best schema for this use case? I can think of a few ideas:
>
> 1. Encode all the points as fields
>
> line p0.x=0.0,p0.y=1.0,p1.x=0.1,p1.y=0.2,...
>
>
> This has low series cardinality but high field cardinality. The RAM needs
> of the system would be fairly low, and because each field is densely
> populated it would compress and query fairly well. There can be performance
> issues querying many fields at once, but since the field count is less than
> 40 and they are all floats, it might be okay depending on your query
> frequency.
>
>
> 2. Use a tag to distinguish points
>
> point name=p0 x=0.0,y=1.0
> point name=p1 x=0.1,y=0.2
>
>
> This would potentially lead to high series cardinality, unless the point
> names don't change over time. It would be quick to return queries and the
> field set would be small. I don't think we have performance modeling for
> the tradeoffs between tags and fields at 40+, but this is the schema I
> would start with, other considerations aside.
>
>
> 3. Serialize all the points as a string
>
> line value="[(0.0,1.0),(0.1,0.2)]"
> It's not an efficient format but just to sketch the idea.
>
>
> This would be storing long strings, which is not the best for
> compressibility or RAM usage. There are also no string functions in
> InfluxDB like substr or find, so you would always have to return the entire
> line and work with that.
>
>
>
> 1 looks good. I'm somehow uncomfortable with using fields names to
> distinguish points though. I feel better with 2 in this regard. But the
> problem with 2 is that reconstructing the line from the points are
> unnecessarily complicated:
>
> 2-A. Each point in the same line can have different timestamps. Whereas 1
> guarantees that all points in the same line have the same timestamp.
>
>
> You can submit explicit timestamps at write time, rather than letting the
> system determine them. Alternately, if you leave the timestamps out, then
> every point in the batch will get the same timestamp. As long as points on
> lines are all in the same batch they will all have the same timestamp.
>
>
> 2-B. How much data points do we need to query to draw the current line?
> There's no guarantee that fetching N data points covers all data points
> that are necessary to reconstruct the line.
>
>
> This would require careful batching when writing, or using another tag to
> differentiate the lines from each other.
>
>
> 3 looks terrible in terms of space efficiency. But it might be easiest to
> reconstruct the line if you have a handy text parser.
>
> It would be ideal if I could just store an array of numbers as a field
> value in InfluxDB. But currently there seems to be no such feature. What's
> the current best practice?
>
>
> There isn't really a best practice for arrays in InfluxDB. I would start
> by modeling schemas 1 and 2 using the influx_stress
> <https://github.com/influxdata/influxdb/tree/master/stress/v2> tool to
> generate randomized load but with a defined schema.
>
>
>
>
> Thanks,
> Mitsutoshi
>
> --
> Remember to include the InfluxDB version number with all issue reports
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com
> <https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Sean Beckett
> Director of Support and Professional Services
> InfluxDB
>
> --
> Remember to include the InfluxDB version number with all issue reports
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com
> <https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CAMLnt0O9T551FMWY_xxan8d%2BLffto614i0SrOPpBfEYEZq%2BH4g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Schema design for arrays of points

Reply via email to