Re: [influxdb] Schema design for arrays of points

Sean Beckett Mon, 03 Oct 2016 20:08:57 -0700

On Sun, Oct 2, 2016 at 8:24 PM, Mitsutoshi Aoe <[email protected]> wrote:


> Hi all,
>
> I'm now trying to encode a set of time-varying 2D points into an InfluxDB
> measurement.
>
> Suppose we write N data points (p_0 .. p_N-1) on xy-plane frequently
> (every second or so). N isn't large (< 20) and may occasionally change over
> time (e.g. every few months). The data points represents a line on the
> plane over time. We continuously query those data points from InfluxDB to
> render the line realtime or at points in time. We usually need the whole
> points (p_0..p_N-1) at once and never query a part of them.
>
> What the best schema for this use case? I can think of a few ideas:
>
> 1. Encode all the points as fields
>
> line p0.x=0.0,p0.y=1.0,p1.x=0.1,p1.y=0.2,...
>
>
This has low series cardinality but high field cardinality. The RAM needs
of the system would be fairly low, and because each field is densely
populated it would compress and query fairly well. There can be performance
issues querying many fields at once, but since the field count is less than
40 and they are all floats, it might be okay depending on your query
frequency.


> 2. Use a tag to distinguish points
>
> point name=p0 x=0.0,y=1.0
> point name=p1 x=0.1,y=0.2
>
>
This would potentially lead to high series cardinality, unless the point
names don't change over time. It would be quick to return queries and the
field set would be small. I don't think we have performance modeling for
the tradeoffs between tags and fields at 40+, but this is the schema I
would start with, other considerations aside.


> 3. Serialize all the points as a string
>
> line value="[(0.0,1.0),(0.1,0.2)]"
> It's not an efficient format but just to sketch the idea.
>

This would be storing long strings, which is not the best for
compressibility or RAM usage. There are also no string functions in
InfluxDB like substr or find, so you would always have to return the entire
line and work with that.


>
> 1 looks good. I'm somehow uncomfortable with using fields names to
> distinguish points though. I feel better with 2 in this regard. But the
> problem with 2 is that reconstructing the line from the points are
> unnecessarily complicated:
>
> 2-A. Each point in the same line can have different timestamps. Whereas 1
> guarantees that all points in the same line have the same timestamp.
>

You can submit explicit timestamps at write time, rather than letting the
system determine them. Alternately, if you leave the timestamps out, then
every point in the batch will get the same timestamp. As long as points on
lines are all in the same batch they will all have the same timestamp.


> 2-B. How much data points do we need to query to draw the current line?
> There's no guarantee that fetching N data points covers all data points
> that are necessary to reconstruct the line.
>

This would require careful batching when writing, or using another tag to
differentiate the lines from each other.


> 3 looks terrible in terms of space efficiency. But it might be easiest to
> reconstruct the line if you have a handy text parser.
>
> It would be ideal if I could just store an array of numbers as a field
> value in InfluxDB. But currently there seems to be no such feature. What's
> the current best practice?
>

There isn't really a best practice for arrays in InfluxDB. I would start by
modeling schemas 1 and 2 using the influx_stress
<https://github.com/influxdata/influxdb/tree/master/stress/v2> tool to
generate randomized load but with a defined schema.


>
>
> Thanks,
> Mitsutoshi
>
> --
> Remember to include the InfluxDB version number with all issue reports
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com
> <https://groups.google.com/d/msgid/influxdb/f2f4bfec-fc87-44b4-a158-262dd657c560%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Sean Beckett
Director of Support and Professional Services
InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CALGqCvP1%2BddhL%2B%3DGi8H7urCv_pMCnF37ih87%2BJ36FbTyi%3DN3rg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Schema design for arrays of points

Reply via email to