Turns out Normal solution one isn't even a stop-gap option due to Issue #7129 <https://github.com/influxdata/influxdb/issues/7129> I just filed.
On Monday, August 8, 2016 at 1:57:06 PM UTC-7, Mike Schroll wrote: > > I continue my battle with series cardinality issues, while still > maintaining summarized data through continuous queries long term. > > Here's my latest approach, that doesn't work, and why: > > Background: Trying to get counts of a highly variable tag, which contains > domain name. It needs to be a tag to be "grouped by", along with other tags. > > Stage 0: At the ingestion level, I now split data into multiple > measurements (via multiple UDP listeners), some with low cardinality to > keep 'raw' data forever, some for various other processing: ex: 'domains' > Stage 1: 'domains' measurement comes in with many tags - high cardinality, > but 2 hour retention policy > Stage 2: CQ_a: count of Stage1 data, grouped by tags of interest time(1h), > 2 hour retention policy > Stage 3: CQ_b: sum of Stage2 data, further reducing tags, and cardinality > with a where constraint. time(1h), 2 day retention policy > Stage 4: CQ_c: sum of Stage3 data, same tags as Stage3, now with time(1d), > 2 day retention policy > Stage 5: CQ_d: top(100,domain), and selecting many tags to be stored as > fields, time(1d) Forever retention policy. > > Everything works great, until Stage 5. Because it is doing one-day > summaries, and I reduced it to only one tag, which is not 'domain', the > series is no longer unique enough to accommodate more than one data point > per timestamp. Because it's being written by a continuous query, it is > grouping on time(1d) and selects into the new measurement with only a > single daily timestamp value. As a result, I go from 3500 records to 64 > (one per the single remaining tag), which is not helpful. > > Normal solutions to this problem would be: > 1) Add back the domain tag - Not feasible for me due to high cardinality > over time. > 2) Increment timestamp - This is not supported by integrated CQs, and I > don't see that it is easily supported by Kapacitor. Mentioned here > <https://groups.google.com/d/topic/influxdb/FFMmfTJ2pGg/discussion> and > here <https://github.com/influxdata/influxdb/issues/4614>. I think there > was a feature request related to this, but shot down. Can't find it again > though. > > I see two possible solutions: > i) Using influx to ingest the data, do some summing, and then I'll try to > put it into postgresql for long-term retrival, which should be fine once > sufficiently summarized. Maybe using www_fdw > <https://github.com/cyga/www_fdw/wiki/Documentation> in postgresql to > query influx, otherwise just use an outside script which queries influx, > and inserts into postgresql. > ii) Query the data from the top() command using a script, emulate the > actions of the continuous query -- re-inserting it; but modify the daily > timestamp so that each item returned from the top(100, domain) has an > incremental timestamp: ending in 1,2,3 > > I'm not sure if this is in any way a 'normal' use of influxdb, whether > there's a feature request in here; or if it'd just be solved by the > long-term plan to not require all tags to be in ram, which will reduce > (eliminate) the whole cardinality constraint. > -- Remember to include the InfluxDB version number with all issue reports --- You received this message because you are subscribed to the Google Groups "InfluxDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/c6d73445-9e23-4df9-9acc-66d8ed10417e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
