Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-19 Thread Jens Rantil
...and temporarily adding more nodes and rebalancing is not an option?—
Sent from Mailbox

On Wed, Jun 18, 2014 at 9:39 PM, Brian Tarbox tar...@cabotresearch.com
wrote:

 I don't think I have the space to run a major compaction right now (I'm
 above 50% disk space used already) and compaction can take extra space I
 think?
 On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli rc...@eventbrite.com wrote:
 On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com
 wrote:

 Thank you!   We are not using TTL, we're manually deleting data more than
 5 days old for this CF.  We're running 1.2.13 and are using size tiered
 compaction (this cf is append-only i.e.zero updates).

 Sounds like we can get away with doing a (stop, delete old-data-file,
 restart) process on a rolling basis if I understand you.


 Sure, though in your case (because you're using STS and can) I'd probably
 just run a major compaction.

 =Rob



can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Brian Tarbox
I have a column family that only stores the last 5 days worth of some
data...and yet I have files in the data directory for this CF that are 3
weeks old.  They take the form:

keyspace-CFName-ic--Filter.db
keyspace-CFName-ic--Index.db
keyspace-CFName-ic--Data.db
keyspace-CFName-ic--Statistics.db
keyspace-CFName-ic--TOC.txt
keyspace-CFName-ic--Summary.db

I have six bunches of these file groups, each with a different 
value...and with timestamps of each of the last five days...plus one group
from 3 weeks ago...which makes me wonder if that group  somehow should have
been deleted but were not.

The files are tens or hundreds of gigs so deleting would be good, unless
its really bad!

Thanks,

Brian Tarbox


Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox tar...@cabotresearch.com
wrote:

 I have a column family that only stores the last 5 days worth of some
 data...and yet I have files in the data directory for this CF that are 3
 weeks old.


Are you using TTL? If so :

https://issues.apache.org/jira/browse/CASSANDRA-6654

Are you using size tiered or level compaction?

I have six bunches of these file groups, each with a different 
 value...and with timestamps of each of the last five days...plus one group
 from 3 weeks ago...which makes me wonder if that group  somehow should have
 been deleted but were not.

 The files are tens or hundreds of gigs so deleting would be good, unless
 its really bad!


Data files can't be deleted from the data dir with Cassandra running, but
it should be fine (if probably technically unsupported) to delete them with
Cassandra stopped. In most cases you don't want to do so, because you might
un-mask deleted rows or cause unexpected consistency characteristics.

In your case, you know that no data in files created 3 weeks old can
possibly have any value, so it is safe to delete them.

=Rob


Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Brian Tarbox
Rob,
Thank you!   We are not using TTL, we're manually deleting data more than 5
days old for this CF.  We're running 1.2.13 and are using size tiered
compaction (this cf is append-only i.e.zero updates).

Sounds like we can get away with doing a (stop, delete old-data-file,
restart) process on a rolling basis if I understand you.

Thanks,

Brian


On Wed, Jun 18, 2014 at 2:37 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jun 18, 2014 at 10:56 AM, Brian Tarbox tar...@cabotresearch.com
 wrote:

 I have a column family that only stores the last 5 days worth of some
 data...and yet I have files in the data directory for this CF that are 3
 weeks old.


 Are you using TTL? If so :

 https://issues.apache.org/jira/browse/CASSANDRA-6654

 Are you using size tiered or level compaction?

 I have six bunches of these file groups, each with a different 
 value...and with timestamps of each of the last five days...plus one group
 from 3 weeks ago...which makes me wonder if that group  somehow should have
 been deleted but were not.

 The files are tens or hundreds of gigs so deleting would be good, unless
 its really bad!


 Data files can't be deleted from the data dir with Cassandra running, but
 it should be fine (if probably technically unsupported) to delete them with
 Cassandra stopped. In most cases you don't want to do so, because you might
 un-mask deleted rows or cause unexpected consistency characteristics.

 In your case, you know that no data in files created 3 weeks old can
 possibly have any value, so it is safe to delete them.

 =Rob




Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Robert Coli
On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com
wrote:

 Thank you!   We are not using TTL, we're manually deleting data more than
 5 days old for this CF.  We're running 1.2.13 and are using size tiered
 compaction (this cf is append-only i.e.zero updates).

 Sounds like we can get away with doing a (stop, delete old-data-file,
 restart) process on a rolling basis if I understand you.


Sure, though in your case (because you're using STS and can) I'd probably
just run a major compaction.

=Rob


Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-18 Thread Brian Tarbox
I don't think I have the space to run a major compaction right now (I'm
above 50% disk space used already) and compaction can take extra space I
think?


On Wed, Jun 18, 2014 at 3:24 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Jun 18, 2014 at 12:05 PM, Brian Tarbox tar...@cabotresearch.com
 wrote:

 Thank you!   We are not using TTL, we're manually deleting data more than
 5 days old for this CF.  We're running 1.2.13 and are using size tiered
 compaction (this cf is append-only i.e.zero updates).

 Sounds like we can get away with doing a (stop, delete old-data-file,
 restart) process on a rolling basis if I understand you.


 Sure, though in your case (because you're using STS and can) I'd probably
 just run a major compaction.

 =Rob