Re: [influxdb] Measurement Schema Design

Mathias Herberts Thu, 22 Sep 2016 22:48:59 -0700

I would not use different measurements as InfluxDB does not allow you do to 
cross measurement analytics, so if you go the multi-measurements way you 
won't be able to crunch your storage metrics with your network ones if you 
used two different measurements.


On Friday, September 23, 2016 at 2:42:49 AM UTC+2, [email protected] wrote:
>
>
> We are also in the process of figuring out how to store data for a 
> multi-tenant system so this is very helpful.  I have a couple of follow up 
> questions along the similar lines (caveat -- we're very new to InfluxDB so 
> I may have misused the terms). 
>
> When thinking about our schema (ignoring tenancy) we were considering 
> having 
> measurements which stored data from a specific sub-system of our 
> application.  So for example we might have one measurement for the data 
> related to our storage sub-system (mongodb) and other for the rest layer 
> and so forth.  These would have fields which stored data for specific 
> metrics and we'd use tags to segment the data.  So we might have a 
> measurement for rest with fields for throughput and response time and tags 
> for HTTP method and relative URI. 
>
> When laying tenancy into this our first inclination was to do so by adding 
> the tenant id to the measurement name (so we'd have XYZ-mongodb and 
> XYZ-rest).  However, based on this discussion it sounds like you'd advise 
> against that in favor of simply having a measurement per tenant and putting 
> all of the metrics in that as fields (which begs the question of why not 
> have 1 measurement in the non-tenant schema).  One issue that arrises with 
> that approach is what to do about overlapping field keys (both mongodb and 
> rest have throughput for example).  It seems like we could use either 
> stylized field keys (mongo_throughput and rest_throughput) or we could use 
> tags.  Any thoughts on which would be preferable? 
>
> Even with using tags to resolve metric overlap I think we'd end up with 
> 1000's fields and if we used prefixing we'd have 10,000's.  Also, if I 
> understand how writes work we'd have some points that are extremely sparse 
> (those for sub-systems with more specialized data, such as the JVM) and 
> some points with a large number of field values (on the order of 100's to 
> 1000's).  Is this going to cause issues?  We'd also end up doing lots of 
> queries which pull out only a small sub-set of the fields, any concern 
> there? 
>
> Anyway, thanks in advance for any advice. I'm looking forward to trying 
> this out and seeing how it works. 
>
> Sean Fitts 
>
>
> On Friday, August 19, 2016 at 12:27:13 PM UTC-7, Sean Beckett wrote: 
> > There's not much performance gain from segmenting the data. It will all 
> live on the filesystem organized by time first, and series second. As long 
> as your queries are bounded to particular times and series, the measurement 
> schema won't make too much difference. 
> > 
> > 
> > However, DROP MEASUREMENT is more performant than DROP SERIES, so I 
> would think scoping each customer to a measurement (Schema #2) would be 
> beneficial for overall organization.  
> > 
> > 
> > Schema #3 is not a great schema, as it puts important metadata in the 
> measurement name. Typically that's an anti-pattern. Additionally, there are 
> no JOINs across measurements, so you wouldn't be able to query for the 
> COUNT() of all events across a customer if each page ID meant a new 
> measurement. 
> > 
> > 
> > On Tue, Aug 2, 2016 at 11:05 PM,  <[email protected]> wrote: 
> > 
> > Hi there, 
> > 
> > 
> > I'm not quite sure which schema design would be better and hoping 
> someone could help: 
> > 
> > 
> > (1)  
> > Measurement = PageViews 
> > 
> > Tags = OrganisationId=XYZ, PageId=123 
> > Values = BrowserAgent=Chrome, URL=test.com 
> > 
> > 
> > 
> > Measurement = Clicks 
> > Tags = OrganisationId=XYZ, PageId=123 
> > Values = BrowserAgent=Chrome, URL=test.com 
> > 
> > 
> > or 
> > 
> > 
> > (2) 
> > Measurement = XYZ (OrganisationID) 
> > 
> > Tags = PageId=123, Event=PageView 
> > Values = BrowserAgent=Chrome, URL=test.com 
> > 
> > 
> > 
> > Measurement = XYZ (OrganisationID) 
> > Tags = PageId=123, Event=Click 
> > Values = BrowserAgent=Chrome, URL=test.com 
> > 
> > 
> > or 
> > 
> > 
> > (3) 
> > Measurement = XYZ-123 (OrganisationId-PageId) 
> > Tags = Event=PageView 
> > 
> > Values = BrowserAgent=Chrome, URL=test.com 
> > 
> > 
> > 
> > Measurement = XYZ-123 (OrganisationId-PageId) 
> > Tags = Event=Click 
> > Values = BrowserAgent=Chrome, URL=test.com 
> > 
> > 
> > This would be a used in a multi-tenant environment where each customer 
> (organisation) has their own data. Does the use of a orgid-pageid 
> measurement help the underlying database? 
> > 
> > IE, with SQL having a table name as the OrgId-PageId would restrict the 
> indexing storage/speed to just that of that scope, however I'm not sure it 
> would be the same with InfluxDB as perhaps indexes are based on series 
> (which includes measurement and tags).  
> > 
> > 
> > So then in theory it would be much of a muchness - no performance gain 
> by segmenting data? 
> > 
> > 
> > Ryan 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > Remember to include the InfluxDB version number with all issue reports 
> > 
> > --- 
> > 
> > You received this message because you are subscribed to the Google 
> Groups "InfluxDB" group. 
> > 
> > To unsubscribe from this group and stop receiving emails from it, send 
> an email to [email protected]. 
> > 
> > To post to this group, send email to [email protected]. 
> > 
> > Visit this group at https://groups.google.com/group/influxdb. 
> > 
> > To view this discussion on the web visit 
> https://groups.google.com/d/msgid/influxdb/0100a133-7b23-4d29-aa7a-33e2666991d7%40googlegroups.com.
>  
>
> > 
> > For more options, visit https://groups.google.com/d/optout. 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > 
> > Sean Beckett 
> > Director of Support and Professional Services 
> > InfluxDB 
>
>

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/7a51a003-eba3-489e-8325-e00b434b74c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [influxdb] Measurement Schema Design

Reply via email to