Re: [influxdb] Measurement Schema Design

Sean Fitts Fri, 30 Sep 2016 08:57:47 -0700

Sean, hi.

Thanks for the responses.  If you have the time I have some follow-ups 
below...

On Thursday, September 29, 2016 at 8:25:23 PM UTC-7, Sean Beckett wrote:
>
> For multi-tenant my first thought is each tenant gets their own database. 
> It does lead to significant series duplication, but it makes for performant 
> add and remove tenant operations. If the cardinality gets too high, some 
> databases can be backed up and restored into a new instance. 
>

Do you have an experience with what a reasonable database cardinality is? 
 Are we talking 100's, 1000's, 10,000's?  Is the primary issue here going 
to be the number of open files?

>
> Within the database, have a measurement per sub-system, unless of course 
> you want to enable queries across sub-systems. Otherwise store the 
> subsystem as a tag. That would require unique field names for each 
> subsystem. Field cardinality is not a significant concern, unless there are 
> only a few values per field, per shard. 
>

When you say that storing subsystems as tags would require unique field 
names for each sub-systems I'm not sure I understand why.  If 2 subsystems 
share a particular metric (response time) couldn't there be one field for 
that tagged with the subsystem name?  IIUC that would result in 2 series, 
one for each subsystem.  Note that I'm not sure we'll do this because I 
currently don't see the need to aggregate data across sub-systems, but I 
want to make sure I understand how tags work.

>
> You can review the Storage Engine 
> <http://docs.influxdata.com/influxdb/v1.0/concepts/storage_engine/#compression>
>  
> doc for more about field density in TSM files. Writing very sparse fields 
> is not recommended, but querying only a few fields per query is fine. Each 
> measurement + tagset + field is stored in its own series (columnar storage).
>

Thanks, that provides a good high level overview.  I'm curious about the 
comment wrt writing sparse fields. Given that both the cache and the TSM 
files appear to treat each series as an independent entity, I wouldn't 
think it would matter how sparse either the fields or the points were 
(unless it is recording data for the "gaps".  Sparse points might imply 
more points which I'm guessing could impact the WAL (which if I'm reading 
the doc correctly appears to be a log of the received points).  Clearly I'm 
missing something.

Thanks again for your help.

Sean

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to influxdb+unsubscr...@googlegroups.com.
To post to this group, send email to influxdb@googlegroups.com.
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/c4d7900c-266a-48fd-88e3-2eef2a69d500%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Measurement Schema Design

Reply via email to