RE: how to choose tombstone_failure_threshold value if I want to delete billions of entries?

2020-11-20 Thread Durity, Sean R
Tombstone_failure_threshold is only for reads. If the tombstones are in 
different partitions, and you aren’t doing cross-partition reads, you shouldn’t 
need to adjust that value.

If disk space recovery is the goal, it depends on how available you need the 
data to be. The faster way is probably to unload the 2 billion you want to 
keep, truncate the table, reload the 2 billion. But you might have some data 
unavailable during the reload. Can the app tolerate that? Dsbulk can make this 
much faster than previous methods.

The tombstone + compaction method will take a while, and could get tricky if 
some nodes are close to the limit for compaction to actually occur. You would 
want to adjust gc_grace to a low (but acceptable) time and probably turn on 
unchecked_tombstone_compaction with a low tombstone threshold (0.1 or lower?). 
You would probably still need to force a major compaction to get rid of data 
where the tombstones are in different sstables than the original data (assuming 
size-tiered). This is all much more tedious, error-prone, and requires some 
attention to each node. If a node can’t compact, you might have to wipe it and 
rebuild/re-add it to the cluster.


Sean Durity

From: Pushpendra Rajpoot 
Sent: Friday, November 20, 2020 10:34 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] how to choose tombstone_failure_threshold value if I want 
to delete billions of entries?

Hi Team,

I have a table having approx 15 billions entries and I want to delete approx 13 
billions entries from it. I cannot write 13 billion tombstones in one go since 
there is a disk space crunch.

I am planning to delete data in chunks so I will be creating 400 millions 
tombstones in one go.

Now, I have 2 questions:

1. What is the optimal value of the tombstone_failure_threshold for the above 
scenario?
2. What is the best way to delete 13 billions entries in my case ?

Regards,
Pushpendra



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


how to choose tombstone_failure_threshold value if I want to delete billions of entries?

2020-11-20 Thread Pushpendra Rajpoot
Hi Team,

I have a table having approx 15 billions entries and I want to delete
approx 13 billions entries from it. I cannot write 13 billion tombstones in
one go since there is a disk space crunch.

I am planning to delete data in chunks so I will be creating 400 millions
tombstones in one go.

Now, I have 2 questions:

1. What is the optimal value of the tombstone_failure_threshold for the
above scenario?
2. What is the best way to delete 13 billions entries in my case ?

Regards,
Pushpendra