Hi Mike,

> If I run a select statement on a field, RAM usage increases to 56GB, and
it returns the answers.
> If I try to run a select statement like select *, RAM usage exceeds
128GB, swaps another 60GB, never returns.

Have you tried using an aggregate query? SELECT field FROM measurement
GROUP BY tag will be more memory performant for this kind of operation and
also return results per series, which is usually more useful than having
everything returned in one result series. Using an aggregate query will
also allow you to set the "chunked=true" URL parameter to return segments
of results.

In general, running a non-aggregate query on all 4.2M series at once will
be highly inefficient because all series are merged into one result set.
The query engine loads all series into memory to return correctly ordered
results. This is usually fine for <100k series, but gets increasingly
expensive as more series are selected.

I've opened a feature request to add a limit to the number of series pulled
into memory at once, which will reduce memory usage at the tradeoff of a
longer query time. https://github.com/influxdata/influxdb/issues/7035

> If I run a select statement on a tag, I get an empty-set returned to me,
after lots of CPU-chugging; but RAM usage doesn't exceed 56GB.
> {"results":[{}]}

Tags can not be directly queried. Currently, a field value needs to be
selected in addition to the tag or no results will be returned. Here's the
feature request for selecting only a tag.
https://github.com/influxdata/influxdb/issues/5548

> Using show queries and kill query ID, I can successfully kill the select
* query (no longer shows in show queries) -- however ram usage does not
drop; and I'm forced to restart influxdb to get back to a base-line state.
> Is this run-away ram usage considered a bug? or expected behavior somehow
based on cardinality, only triggered by certain select queries?

This sounds like a bug. Killed queries should be cleaned up and any
allocated memory should be freed by the gc in a few minutes. Can you open
an issue with a repro case?

Hope that helps!

On Mon, Jul 18, 2016 at 9:49 PM, Mike Schroll <[email protected]> wrote:

> I have a DB with high cardinality (in the process of resolving this; but
> trying to export my old data first!)
>
> Per: SELECT sum(numSeries) AS "total_series" FROM "_internal".."database"
> WHERE time > now() - 10s
> I have 4.2M series.
>
> I'm running on a server with 128GB ram.
> Normally, at startup -- it uses 20GB-40GB of ram.
>
> If I run a select statement on a field, RAM usage increases to 56GB, and
> it returns the answers.
>
> If I run a select statement on a tag, I get an empty-set returned to me,
> after lots of CPU-chugging; but RAM usage doesn't exceed 56GB.
> {"results":[{}]}
>
> If I try to run a select statement like select *, RAM usage exceeds 128GB,
> swaps another 60GB, never returns.
>
> I'm running 0.13
>
> I understand high cardinality is 'bad' -- but typically recommendations
> call for just adding more RAM, and hoping for the best. I've not seen
> mention of differentiation in RAM usage based on select parameters, nor the
> behavior of returning an empty-set.
>
> All selects are done on a 10min period from two days ago, with typically
> 3M data points per day. I use 1d shard size. There's one measurement I'm
> doing these selects from. I have a handful of other measurements being
> generated by continuous queries.
>
> Using show queries and kill query ID, I can successfully kill the select *
> query (no longer shows in show queries) -- however ram usage does not drop;
> and I'm forced to restart influxdb to get back to a base-line state.
>
> Is this run-away ram usage considered a bug? or expected behavior somehow
> based on cardinality, only triggered by certain select queries?
>
> I've heard of others with higher series cardinality on the list... never
> heard them speak of issues like this.
>
> Due to this issue, I'm unable to export my data.
>
> Appreciate any insight; happy to debug further if its a bug, and I can be
> of assistance tracking it down (and happy to open a github issue if more
> appropriate than discussion here)
>
> Thanks!
>
> --
> Remember to include the InfluxDB version number with all issue reports
> ---
> You received this message because you are subscribed to the Google Groups
> "InfluxDB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/influxdb.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/influxdb/b705d0bc-473d-46a0-9065-26d93883581c%40googlegroups.com
> <https://groups.google.com/d/msgid/influxdb/b705d0bc-473d-46a0-9065-26d93883581c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Gunnar Aasen
InfluxDB

-- 
Remember to include the InfluxDB version number with all issue reports
--- 
You received this message because you are subscribed to the Google Groups 
"InfluxDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/influxdb.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/influxdb/CAJZX7zN_r0624trJ21z%2BYo3BY-C8CmD6tK%2BDd7PA1woh12XgXA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to