Re: [influxdb] Measurement Schema Design

smfitts Fri, 23 Sep 2016 06:48:41 -0700

Interesting.  Does this fact lead to most schemas consisting of a single 
measurement?  And just to be clear, the restriction is that you can't join 
between them in the DB itself, correct?  Presumably you can issue queries 
against each so you can display them simultaneously (say in Grafana).


Thanks again for the insight.

Sean

On Thursday, September 22, 2016 at 10:48:37 PM UTC-7, Mathias Herberts wrote:
> I would not use different measurements as InfluxDB does not allow you do to 
> cross measurement analytics, so if you go the multi-measurements way you 
> won't be able to crunch your storage metrics with your network ones if you 
> used two different measurements.
> 
> On Friday, September 23, 2016 at 2:42:49 AM UTC+2, [email protected] wrote:
> 
> We are also in the process of figuring out how to store data for a 
> multi-tenant system so this is very helpful.  I have a couple of follow up 
> questions along the similar lines (caveat -- we're very new to InfluxDB so I 
> may have misused the terms).
> 
> 
> 
> When thinking about our schema (ignoring tenancy) we were considering having 
> 
> measurements which stored data from a specific sub-system of our application. 
>  So for example we might have one measurement for the data related to our 
> storage sub-system (mongodb) and other for the rest layer and so forth.  
> These would have fields which stored data for specific metrics and we'd use 
> tags to segment the data.  So we might have a measurement for rest with 
> fields for throughput and response time and tags for HTTP method and relative 
> URI.
> 
> 
> 
> When laying tenancy into this our first inclination was to do so by adding 
> the tenant id to the measurement name (so we'd have XYZ-mongodb and 
> XYZ-rest).  However, based on this discussion it sounds like you'd advise 
> against that in favor of simply having a measurement per tenant and putting 
> all of the metrics in that as fields (which begs the question of why not have 
> 1 measurement in the non-tenant schema).  One issue that arrises with that 
> approach is what to do about overlapping field keys (both mongodb and rest 
> have throughput for example).  It seems like we could use either stylized 
> field keys (mongo_throughput and rest_throughput) or we could use tags.  Any 
> thoughts on which would be preferable?
> 
> 
> 
> Even with using tags to resolve metric overlap I think we'd end up with 
> 1000's fields and if we used prefixing we'd have 10,000's.  Also, if I 
> understand how writes work we'd have some points that are extremely sparse 
> (those for sub-systems with more specialized data, such as the JVM) and some 
> points with a large number of field values (on the order of 100's to 1000's). 
>  Is this going to cause issues?  We'd also end up doing lots of queries which 
> pull out only a small sub-set of the fields, any concern there?
> 
> 
> 
> Anyway, thanks in advance for any advice. I'm looking forward to trying this 
> out and seeing how it works.
> 
> 
> 
> Sean Fitts
> 
> 
> 
> 
> 
> On Friday, August 19, 2016 at 12:27:13 PM UTC-7, Sean Beckett wrote:
> 
> > There's not much performance gain from segmenting the data. It will all 
> > live on the filesystem organized by time first, and series second. As long 
> > as your queries are bounded to particular times and series, the measurement 
> > schema won't make too much difference.
> 
> > 
> 
> > 
> 
> > However, DROP MEASUREMENT is more performant than DROP SERIES, so I would 
> > think scoping each customer to a measurement (Schema #2) would be 
> > beneficial for overall organization. 
> 
> > 
> 
> > 
> 
> > Schema #3 is not a great schema, as it puts important metadata in the 
> > measurement name. Typically that's an anti-pattern. Additionally, there are 
> > no JOINs across measurements, so you wouldn't be able to query for the 
> > COUNT() of all events across a customer if each page ID meant a new 
> > measurement.
> 
> > 
> 
> > 
> 
> > On Tue, Aug 2, 2016 at 11:05 PM,  <[email protected]> wrote:
> 
> > 
> 
> > Hi there,
> 
> > 
> 
> > 
> 
> > I'm not quite sure which schema design would be better and hoping someone 
> > could help:
> 
> > 
> 
> > 
> 
> > (1) 
> 
> > Measurement = PageViews
> 
> > 
> 
> > Tags = OrganisationId=XYZ, PageId=123
> 
> > Values = BrowserAgent=Chrome, URL=test.com
> 
> > 
> 
> > 
> 
> > 
> 
> > Measurement = Clicks
> 
> > Tags = OrganisationId=XYZ, PageId=123
> 
> > Values = BrowserAgent=Chrome, URL=test.com
> 
> > 
> 
> > 
> 
> > or
> 
> > 
> 
> > 
> 
> > (2)
> 
> > Measurement = XYZ (OrganisationID)
> 
> > 
> 
> > Tags = PageId=123, Event=PageView
> 
> > Values = BrowserAgent=Chrome, URL=test.com
> 
> > 
> 
> > 
> 
> > 
> 
> > Measurement = XYZ (OrganisationID)
> 
> > Tags = PageId=123, Event=Click
> 
> > Values = BrowserAgent=Chrome, URL=test.com
> 
> > 
> 
> > 
> 
> > or
> 
> > 
> 
> > 
> 
> > (3)
> 
> > Measurement = XYZ-123 (OrganisationId-PageId)
> 
> > Tags = Event=PageView
> 
> > 
> 
> > Values = BrowserAgent=Chrome, URL=test.com
> 
> > 
> 
> > 
> 
> > 
> 
> > Measurement = XYZ-123 (OrganisationId-PageId)
> 
> > Tags = Event=Click
> 
> > Values = BrowserAgent=Chrome, URL=test.com
> 
> > 
> 
> > 
> 
> > This would be a used in a multi-tenant environment where each customer 
> > (organisation) has their own data. Does the use of a orgid-pageid 
> > measurement help the underlying database?
> 
> > 
> 
> > IE, with SQL having a table name as the OrgId-PageId would restrict the 
> > indexing storage/speed to just that of that scope, however I'm not sure it 
> > would be the same with InfluxDB as perhaps indexes are based on series 
> > (which includes measurement and tags). 
> 
> > 
> 
> > 
> 
> > So then in theory it would be much of a muchness - no performance gain by 
> > segmenting data?
> 
> > 
> 
> > 
> 
> > Ryan
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > -- 
> 
> > 
> 
> > Remember to include the InfluxDB version number with all issue reports
> 
> > 
> 
> > --- 
> 
> > 
> 
> > You received this message because you are subscribed to the Google Groups 
> > "InfluxDB" group.
> 
> > 
> 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to [email protected].
> 
> > 
> 
> > To post to this group, send email to [email protected].
> 
> > 
> 
> > Visit this group at https://groups.google.com/group/influxdb.
> 
> > 
> 
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/influxdb/0100a133-7b23-4d29-aa7a-33e2666991d7%40googlegroups.com.
> 
> > 
> 
> > For more options, visit https://groups.google.com/d/optout.
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > 
> 
> > -- 
> 
> > 
> 
> > 
> 
> > Sean Beckett
> 
> > Director of Support and Professional Services
> 
> > InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/4d7c4721-a0e0-4b21-b9dc-061a0ac76214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Measurement Schema Design

Reply via email to