For any delete, a tombstone is written [1]. The act of writing a tombstone itself isn't very expensive, but a tombstone will cause a full compaction [2], which is relatively expensive.
Since the compaction is going to be triggered after dropping the series, go ahead and issue all the deletes at the same time. If you spread them out, that will just cause multiple compactions over time instead of one single compaction. You can use the last function [3] to get the last time of a field. Grouping by the host tag should allow you to execute a single query to determine all the hosts that have stopped reporting. Using a time range will keep the query fast and the response small. With a minimal amount of post-processing it should be easy to generate the correct drop series statement. > select last(usage_user) from cpu where time > '2016-12-21T16:18:30Z' and time < '2016-12-21T16:20:00Z' group by host name: cpu tags: host=dead_host time last ---- ---- 2016-12-21T16:18:34Z 0 <---- last reported time close to start of time window name: cpu tags: host=live_host time last ---- ---- 2016-12-21T16:19:58Z 0 <------- last reported time close to end of time window [1] https://docs.influxdata.com/influxdb/v1.1/concepts/storage_engine/#deletes [2] https://docs.influxdata.com/influxdb/v1.1/concepts/storage_engine/#compactions [3] https://docs.influxdata.com/influxdb/v1.1/query_language/functions/#last On Wed, Dec 21, 2016 at 7:09 AM, <[email protected]> wrote: > Thank you for such a detailed answer. > > I'm still working out what our policies should be, but: > - most of our data is 'host specific', so when a host is removed, its data > is nearly entirely useless (not 100% true, but nearly) > - we're quite tight with disk space > - we'd love to have 13 months of data available > > I may end up putting the important data into specific shards, and then all > the other stuff into shorter-lived shards, but until then, I thought it > might be useful to delete the old data. > > One use case where deleting the old data is useful is in Grafana templated > dashboards. If you have a template variable called 'host' which does > something like: > > SHOW TAG VALUES FROM system WITH KEY=host > > ...then it'll show all the hosts that have ever reported stats. If you > then select one that's been dead a while, you (seem to) get no results, > which confuses people. Since the data is of low value, deleting the host's > data means no more confusion. I'd agree that a better solution would be to > somehow filter those hosts from the template variable (but I can't see how > we could do that). > > Having looked at this some more and your answer, I'm thinking I'll write > some sort of batch job which will: > > - Get a list of all known hosts > - query each one for any stats in the last (say) week > - For all hostnames that don't return any data, delete them > > Since deletion isn't cheap, I'm wondering if I could do them a chunk at a > time. If I can, then I could just delete them slowly with multiple small > deletes. A quick look at it suggests DROP SERIES is my only choice, so at > off-peak times is probably my only option. > > Thanks again for your help :-) > > -- > Remember to include the version number! > --- > You received this message because you are subscribed to a topic in the > Google Groups "InfluxData" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/influxdb/YYfUx4RbCAI/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/influxdb. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/influxdb/e165a95b-981c-4dc8-b573-327a6afe4b57%40googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- Remember to include the version number! --- You received this message because you are subscribed to the Google Groups "InfluxData" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/influxdb. To view this discussion on the web visit https://groups.google.com/d/msgid/influxdb/CALxJwdPjuGLVHq3ZO8GuuJYAJs5vOB%3DpNLAVXhuwCp8p0m-Trg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
