Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
...and temporarily adding more nodes and rebalancing is not an option?— Sent from Mailbox On Wed, Jun 18, 2014 at 9:39 PM, Brian Tarbox tar...@cabotresearch.com wrote: I don't think I have the space to run a major compaction right now (I'm above 50% disk space used already) and compaction can take extra space I think? On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com wrote: Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Sure, though in your case (because you're using STS and can) I'd probably just run a major compaction. =Rob
can I kill very old data files in my data folder (I know that sounds crazy but....)
I have a column family that only stores the last 5 days worth of some data...and yet I have files in the data directory for this CF that are 3 weeks old. They take the form: keyspace-CFName-ic--Filter.db keyspace-CFName-ic--Index.db keyspace-CFName-ic--Data.db keyspace-CFName-ic--Statistics.db keyspace-CFName-ic--TOC.txt keyspace-CFName-ic--Summary.db I have six bunches of these file groups, each with a different value...and with timestamps of each of the last five days...plus one group from 3 weeks ago...which makes me wonder if that group somehow should have been deleted but were not. The files are tens or hundreds of gigs so deleting would be good, unless its really bad! Thanks, Brian Tarbox
Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox tar...@cabotresearch.com wrote: I have a column family that only stores the last 5 days worth of some data...and yet I have files in the data directory for this CF that are 3 weeks old. Are you using TTL? If so : https://issues.apache.org/jira/browse/CASSANDRA-6654 Are you using size tiered or level compaction? I have six bunches of these file groups, each with a different value...and with timestamps of each of the last five days...plus one group from 3 weeks ago...which makes me wonder if that group somehow should have been deleted but were not. The files are tens or hundreds of gigs so deleting would be good, unless its really bad! Data files can't be deleted from the data dir with Cassandra running, but it should be fine (if probably technically unsupported) to delete them with Cassandra stopped. In most cases you don't want to do so, because you might un-mask deleted rows or cause unexpected consistency characteristics. In your case, you know that no data in files created 3 weeks old can possibly have any value, so it is safe to delete them. =Rob
Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
Rob, Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Thanks, Brian On Wed, Jun 18, 2014 at 2:37 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox tar...@cabotresearch.com wrote: I have a column family that only stores the last 5 days worth of some data...and yet I have files in the data directory for this CF that are 3 weeks old. Are you using TTL? If so : https://issues.apache.org/jira/browse/CASSANDRA-6654 Are you using size tiered or level compaction? I have six bunches of these file groups, each with a different value...and with timestamps of each of the last five days...plus one group from 3 weeks ago...which makes me wonder if that group somehow should have been deleted but were not. The files are tens or hundreds of gigs so deleting would be good, unless its really bad! Data files can't be deleted from the data dir with Cassandra running, but it should be fine (if probably technically unsupported) to delete them with Cassandra stopped. In most cases you don't want to do so, because you might un-mask deleted rows or cause unexpected consistency characteristics. In your case, you know that no data in files created 3 weeks old can possibly have any value, so it is safe to delete them. =Rob
Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com wrote: Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Sure, though in your case (because you're using STS and can) I'd probably just run a major compaction. =Rob
Re: can I kill very old data files in my data folder (I know that sounds crazy but....)
I don't think I have the space to run a major compaction right now (I'm above 50% disk space used already) and compaction can take extra space I think? On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com wrote: Thank you! We are not using TTL, we're manually deleting data more than 5 days old for this CF. We're running 1.2.13 and are using size tiered compaction (this cf is append-only i.e.zero updates). Sounds like we can get away with doing a (stop, delete old-data-file, restart) process on a rolling basis if I understand you. Sure, though in your case (because you're using STS and can) I'd probably just run a major compaction. =Rob