Re: [influxdb] Measurement Schema Design

smfitts Thu, 22 Sep 2016 17:43:17 -0700

We are also in the process of figuring out how to store data for a multi-tenant 
system so this is very helpful.  I have a couple of follow up questions along 
the similar lines (caveat -- we're very new to InfluxDB so I may have misused 
the terms).

When thinking about our schema (ignoring tenancy) we were considering having 
measurements which stored data from a specific sub-system of our application.  
So for example we might have one measurement for the data related to our 
storage sub-system (mongodb) and other for the rest layer and so forth.  These 
would have fields which stored data for specific metrics and we'd use tags to 
segment the data.  So we might have a measurement for rest with fields for 
throughput and response time and tags for HTTP method and relative URI.

When laying tenancy into this our first inclination was to do so by adding the 
tenant id to the measurement name (so we'd have XYZ-mongodb and XYZ-rest).  
However, based on this discussion it sounds like you'd advise against that in 
favor of simply having a measurement per tenant and putting all of the metrics 
in that as fields (which begs the question of why not have 1 measurement in the 
non-tenant schema).  One issue that arrises with that approach is what to do 
about overlapping field keys (both mongodb and rest have throughput for 
example).  It seems like we could use either stylized field keys 
(mongo_throughput and rest_throughput) or we could use tags.  Any thoughts on 
which would be preferable?

Even with using tags to resolve metric overlap I think we'd end up with 1000's 
fields and if we used prefixing we'd have 10,000's.  Also, if I understand how 
writes work we'd have some points that are extremely sparse (those for 
sub-systems with more specialized data, such as the JVM) and some points with a 
large number of field values (on the order of 100's to 1000's).  Is this going 
to cause issues?  We'd also end up doing lots of queries which pull out only a 
small sub-set of the fields, any concern there?

Anyway, thanks in advance for any advice. I'm looking forward to trying this 
out and seeing how it works.

Sean Fitts

On Friday, August 19, 2016 at 12:27:13 PM UTC-7, Sean Beckett wrote:
> There's not much performance gain from segmenting the data. It will all live 
> on the filesystem organized by time first, and series second. As long as your 
> queries are bounded to particular times and series, the measurement schema 
> won't make too much difference.
> 
> 
> However, DROP MEASUREMENT is more performant than DROP SERIES, so I would 
> think scoping each customer to a measurement (Schema #2) would be beneficial 
> for overall organization. 
> 
> 
> Schema #3 is not a great schema, as it puts important metadata in the 
> measurement name. Typically that's an anti-pattern. Additionally, there are 
> no JOINs across measurements, so you wouldn't be able to query for the 
> COUNT() of all events across a customer if each page ID meant a new 
> measurement.
> 
> 
> On Tue, Aug 2, 2016 at 11:05 PM,  <[email protected]> wrote:
> 
> Hi there,
> 
> 
> I'm not quite sure which schema design would be better and hoping someone 
> could help:
> 
> 
> (1) 
> Measurement = PageViews
> 
> Tags = OrganisationId=XYZ, PageId=123
> Values = BrowserAgent=Chrome, URL=test.com
> 
> 
> 
> Measurement = Clicks
> Tags = OrganisationId=XYZ, PageId=123
> Values = BrowserAgent=Chrome, URL=test.com
> 
> 
> or
> 
> 
> (2)
> Measurement = XYZ (OrganisationID)
> 
> Tags = PageId=123, Event=PageView
> Values = BrowserAgent=Chrome, URL=test.com
> 
> 
> 
> Measurement = XYZ (OrganisationID)
> Tags = PageId=123, Event=Click
> Values = BrowserAgent=Chrome, URL=test.com
> 
> 
> or
> 
> 
> (3)
> Measurement = XYZ-123 (OrganisationId-PageId)
> Tags = Event=PageView
> 
> Values = BrowserAgent=Chrome, URL=test.com
> 
> 
> 
> Measurement = XYZ-123 (OrganisationId-PageId)
> Tags = Event=Click
> Values = BrowserAgent=Chrome, URL=test.com
> 
> 
> This would be a used in a multi-tenant environment where each customer 
> (organisation) has their own data. Does the use of a orgid-pageid measurement 
> help the underlying database?
> 
> IE, with SQL having a table name as the OrgId-PageId would restrict the 
> indexing storage/speed to just that of that scope, however I'm not sure it 
> would be the same with InfluxDB as perhaps indexes are based on series (which 
> includes measurement and tags). 
> 
> 
> So then in theory it would be much of a muchness - no performance gain by 
> segmenting data?
> 
> 
> Ryan
> 
> 
> 
> 
> -- 
> 
> Remember to include the InfluxDB version number with all issue reports
> 
> --- 
> 
> You received this message because you are subscribed to the Google Groups 
> "InfluxDB" group.
> 
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> 
> To post to this group, send email to [email protected].
> 
> Visit this group at https://groups.google.com/group/influxdb.
> 
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/0100a133-7b23-4d29-aa7a-33e2666991d7%40googlegroups.com.
> 
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> 
> 
> -- 
> 
> 
> Sean Beckett
> Director of Support and Professional Services
> InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/381e906a-5270-47c6-89df-58dfab728655%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Measurement Schema Design

Reply via email to