[influxdb] Re: Another attempt at reducing Cardinality

Mike Schroll Mon, 08 Aug 2016 22:22:00 -0700

Turns out Normal solution one isn't even a stop-gap option due to Issue 
#7129 <https://github.com/influxdata/influxdb/issues/7129> I just filed.


On Monday, August 8, 2016 at 1:57:06 PM UTC-7, Mike Schroll wrote:
>
> I continue my battle with series cardinality issues, while still 
> maintaining summarized data through continuous queries long term.
>
> Here's my latest approach, that doesn't work, and why:
>
> Background: Trying to get counts of a highly variable tag, which contains 
> domain name. It needs to be a tag to be "grouped by", along with other tags.
>
> Stage 0: At the ingestion level, I now split data into multiple 
> measurements (via multiple UDP listeners), some with low cardinality to 
> keep 'raw' data forever, some for various other processing: ex: 'domains'
> Stage 1: 'domains' measurement comes in with many tags - high cardinality, 
> but 2 hour retention policy
> Stage 2: CQ_a: count of Stage1 data, grouped by tags of interest time(1h), 
> 2 hour retention policy
> Stage 3: CQ_b: sum of Stage2 data, further reducing tags, and cardinality 
> with a where constraint. time(1h), 2 day retention policy
> Stage 4: CQ_c: sum of Stage3 data, same tags as Stage3, now with time(1d), 
> 2 day retention policy
> Stage 5: CQ_d: top(100,domain), and selecting many tags to be stored as 
> fields, time(1d) Forever retention policy.
>
> Everything works great, until Stage 5. Because it is doing one-day 
> summaries, and I reduced it to only one tag, which is not 'domain', the 
> series is no longer unique enough to accommodate more than one data point 
> per timestamp. Because it's being written by a continuous query, it is 
> grouping on time(1d) and selects into the new measurement with only a 
> single daily timestamp value. As a result, I go from 3500 records to 64 
> (one per the single remaining tag), which is not helpful.
>
> Normal solutions to this problem would be:
> 1) Add back the domain tag - Not feasible for me due to high cardinality 
> over time.
> 2) Increment timestamp - This is not supported by integrated CQs, and I 
> don't see that it is easily supported by Kapacitor. Mentioned here 
> <https://groups.google.com/d/topic/influxdb/FFMmfTJ2pGg/discussion> and 
> here <https://github.com/influxdata/influxdb/issues/4614>. I think there 
> was a feature request related to this, but shot down. Can't find it again 
> though.
>
> I see two possible solutions:
> i) Using influx to ingest the data, do some summing, and then I'll try to 
> put it into postgresql for long-term retrival, which should be fine once 
> sufficiently summarized. Maybe using www_fdw 
> <https://github.com/cyga/www_fdw/wiki/Documentation> in postgresql to 
> query influx, otherwise just use an outside script which queries influx, 
> and inserts into postgresql.
> ii) Query the data from the top() command using a script, emulate the 
> actions of the continuous query -- re-inserting it; but modify the daily 
> timestamp so that each item returned from the top(100, domain) has an 
> incremental timestamp: ending in 1,2,3
>
> I'm not sure if this is in any way a 'normal' use of influxdb, whether 
> there's a feature request in here; or if it'd just be solved by the 
> long-term plan to not require all tags to be in ram, which will reduce 
> (eliminate) the whole cardinality constraint.
>

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/c6d73445-9e23-4df9-9acc-66d8ed10417e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[influxdb] Re: Another attempt at reducing Cardinality

Reply via email to