Re: Query regarding SSTable timestamps and counts

2012-12-10 Thread B. Todd Burruss
my two cents ... i know this thread is a bit old, but the fact that
odd-sized SSTABLEs (usually large ones) will hang around for a while
can be very troublesome on disk space and planning.  our data is
temporal in cassandra, being deleted constantly.  we have seen space
usage in the 1+ TB range when actually there is less than 100gb of
usable data.  this is because the tombstoned data will not be deleted
until it is compacted with its tombstone.  this scenario doesn't
really follow the sizing plan of give yourself 2x disk space due to
compaction.

our fix was to use leveled compaction which maintains very low
overhead and removes tombstoned data fairly quickly.  this is at the
cost of disk I/O, but we are fine with the I/O.



On Tue, Nov 20, 2012 at 5:18 PM, aaron morton aa...@thelastpickle.com wrote:
 upgradetables re-writes every sstable to have the same contents in the
 newest format.

 Agree.
  In the world of compaction, and excluding upgrades, have older sstables is
 expected.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 21/11/2012, at 11:45 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com
 wrote:

 My understanding of the compaction process was that since data files keep
 continuously merging we should not have data files with very old last
 modified timestamps

 It is perfectly OK to have very old SSTables.

 But performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.

 upgradetables re-writes every sstable to have the same contents in the
 newest format.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:

 Hello Aaron,

 Thanks a lot for the reply.

 Looks like the documentation is confusing. Here is the link I am referring
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction


 It does not disable compaction.

 As per the above url,  After running a major compaction, automatic minor
 compactions are no longer triggered, frequently requiring you to manually
 run major compactions on a routine basis. ( Just before the heading Tuning
 Column Family compression in the above link)

 With respect to the replies below :


 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.

 This is for the minor compaction and major compaction should theoretically
 result in one large file irrespective of the number of data files initially?

 This is not something you have to worry about. Unless you are seeing
 1,000's of files using the default compaction.


 Well my worry has been because of the large amount of node movements we have
 done in the ring. We started off with 6 nodes and increased the capacity to
 12 with disproportionate increases every time which resulted in a lot of
 clean of data folders except system, run repair and then a cleanup with an
 aborted attempt in between.

 There were some data.db files older by more than 2 weeks and were not
 modified since then. My understanding of the compaction process was that
 since data files keep continuously merging we should not have data files
 with very old last modified timestamps (assuming there is a good amount of
 writes to the table continuously) I did not have a for sure way of telling
 if everything is alright with the compaction looking at the last modified
 timestamps of all the data.db files.

 What are the compaction issues you are having ?

 Your replies confirm that the timestamps should not be an issue to worry
 about. So I guess I should not be calling them as issues any more.  But
 performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.



 Regards,
 Ananth


 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
 wrote:


 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast for
 major compactions but not minor compactions )

 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.


 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 No.
 Stop just stops the current compaction.
 Nothing is disabled.

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction 

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread aaron morton
 My understanding of the compaction process was that since data files keep 
 continuously merging we should not have data files with very old last 
 modified timestamps 
It is perfectly OK to have very old SSTables. 

 But performing an upgradesstables did decrease the number of data files and 
 removed all the data files with the old timestamps. 
upgradetables re-writes every sstable to have the same contents in the newest 
format. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com wrote:

 Hello Aaron,
 
 Thanks a lot for the reply. 
 
 Looks like the documentation is confusing. Here is the link I am referring 
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
 
 
  It does not disable compaction. 
 As per the above url,  After running a major compaction, automatic minor 
 compactions are no longer triggered, frequently requiring you to manually run 
 major compactions on a routine basis. ( Just before the heading Tuning 
 Column Family compression in the above link) 
 
 With respect to the replies below : 
 
 
  it creates one big file, which will not be compacted until there are (by 
  default) 3 other very big files. 
 This is for the minor compaction and major compaction should theoretically 
 result in one large file irrespective of the number of data files initially? 
 
 This is not something you have to worry about. Unless you are seeing 1,000's 
 of files using the default compaction.
 
 Well my worry has been because of the large amount of node movements we have 
 done in the ring. We started off with 6 nodes and increased the capacity to 
 12 with disproportionate increases every time which resulted in a lot of 
 clean of data folders except system, run repair and then a cleanup with an 
 aborted attempt in between.  
 
 There were some data.db files older by more than 2 weeks and were not 
 modified since then. My understanding of the compaction process was that 
 since data files keep continuously merging we should not have data files with 
 very old last modified timestamps (assuming there is a good amount of writes 
 to the table continuously) I did not have a for sure way of telling if 
 everything is alright with the compaction looking at the last modified 
 timestamps of all the data.db files.
 
 What are the compaction issues you are having ? 
 Your replies confirm that the timestamps should not be an issue to worry 
 about. So I guess I should not be calling them as issues any more.  But 
 performing an upgradesstables did decrease the number of data files and 
 removed all the data files with the old timestamps. 
 
 
 
 Regards,
 Ananth  
 
 
 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com wrote:
 As per datastax documentation, a manual compaction forces the admin to start 
 compaction manually and disables the automated compaction (atleast for major 
 compactions but not minor compactions )
 It does not disable compaction. 
 it creates one big file, which will not be compacted until there are (by 
 default) 3 other very big files. 
 
 
 1. Does a nodetool stop compaction also force the admin to manually run 
 major compaction ( I.e. disable automated major compactions ? ) 
 No. 
 Stop just stops the current compaction. 
 Nothing is disabled. 
 
 2. Can a node restart reset the automated major compaction if a node gets 
 into a manual mode compaction for whatever reason ? 
 Major compaction is not automatic. It is the manual nodetool compact command. 
 Automatic (minor) compaction is controlled by min_compaction_threshold and 
 max_compaction_threshold (for the default compaction strategy).
 
 3. What is the ideal  number of SSTables for a table in a keyspace ( I mean 
 are there any indicators as to whether my compaction is alright or not ? )  
 This is not something you have to worry about. 
 Unless you are seeing 1,000's of files using the default compaction. 
 
  For example, I have seen SSTables on the disk more than 10 days old wherein 
 there were other SSTables belonging to the same table but much younger than 
 the older SSTables (
 No problems. 
 
 4. Does a upgradesstables fix any compaction issues ? 
 What are the compaction issues you are having ? 
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com 
 wrote:
 
 
 We have a cluster  running cassandra 1.1.4. On this cluster, 
 
 1. We had to move the nodes around a bit  when we were adding new nodes 
 (there was quite a good amount of node movement ) 
 
 2. We had to stop compactions during some of the days to save some disk  
 space on some of the nodes when they were running very very low on disk 
 spaces. (via nodetool stop COMPACTION)  
 
 
 As per datastax documentation, 

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Edward Capriolo
On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote:
 My understanding of the compaction process was that since data files keep
 continuously merging we should not have data files with very old last
 modified timestamps

 It is perfectly OK to have very old SSTables.

 But performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.

 upgradetables re-writes every sstable to have the same contents in the
 newest format.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:

 Hello Aaron,

 Thanks a lot for the reply.

 Looks like the documentation is confusing. Here is the link I am referring
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction


 It does not disable compaction.
 As per the above url,  After running a major compaction, automatic minor
 compactions are no longer triggered, frequently requiring you to manually
 run major compactions on a routine basis. ( Just before the heading Tuning
 Column Family compression in the above link)

 With respect to the replies below :


 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.
 This is for the minor compaction and major compaction should theoretically
 result in one large file irrespective of the number of data files initially?

This is not something you have to worry about. Unless you are seeing
 1,000's of files using the default compaction.

 Well my worry has been because of the large amount of node movements we have
 done in the ring. We started off with 6 nodes and increased the capacity to
 12 with disproportionate increases every time which resulted in a lot of
 clean of data folders except system, run repair and then a cleanup with an
 aborted attempt in between.

 There were some data.db files older by more than 2 weeks and were not
 modified since then. My understanding of the compaction process was that
 since data files keep continuously merging we should not have data files
 with very old last modified timestamps (assuming there is a good amount of
 writes to the table continuously) I did not have a for sure way of telling
 if everything is alright with the compaction looking at the last modified
 timestamps of all the data.db files.

What are the compaction issues you are having ?
 Your replies confirm that the timestamps should not be an issue to worry
 about. So I guess I should not be calling them as issues any more.  But
 performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.



 Regards,
 Ananth


 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
 wrote:

 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast for
 major compactions but not minor compactions )

 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.


 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 No.
 Stop just stops the current compaction.
 Nothing is disabled.

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction strategy).

 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )

 This is not something you have to worry about.
 Unless you are seeing 1,000's of files using the default compaction.

  For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the same table but much
 younger than the older SSTables (

 No problems.

 4. Does a upgradesstables fix any compaction issues ?

 What are the compaction issues you are having ?


 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:


 We have a cluster  running cassandra 1.1.4. On this cluster,

 1. We had to move the nodes around a bit  when we were adding new nodes
 (there was quite a good amount of node movement )

 2. We had to stop compactions during some of the days to save some disk
 space on some of the nodes when they were running very very low on disk
 spaces. (via nodetool stop COMPACTION)


 As per datastax documentation, a manual 

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Ananth Gundabattula
Thanks a lot Aaron and Edward.

The mail thread clarifies some things for me.

For letting others know on this thread, running an upgradesstables did
decrease our bloom filter false positive ratios a lot. ( upgradesstables
was run not to upgrade from a casasndra version to a higher cassandra
version but because of all the node movement we had done to upgrade our
cluster in a staggered way with aborted attempts in between and I
understand that upgradesstables was not necessarily required for the high
bloom filter false positives rates we were seeing )


Regards,
Ananth


On Wed, Nov 21, 2012 at 9:45 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com
 wrote:
  My understanding of the compaction process was that since data files keep
  continuously merging we should not have data files with very old last
  modified timestamps
 
  It is perfectly OK to have very old SSTables.
 
  But performing an upgradesstables did decrease the number of data files
 and
  removed all the data files with the old timestamps.
 
  upgradetables re-writes every sstable to have the same contents in the
  newest format.
 
  Cheers
 
  -
  Aaron Morton
  Freelance Cassandra Developer
  New Zealand
 
  @aaronmorton
  http://www.thelastpickle.com
 
  On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
  wrote:
 
  Hello Aaron,
 
  Thanks a lot for the reply.
 
  Looks like the documentation is confusing. Here is the link I am
 referring
  to:
 http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
 
 
  It does not disable compaction.
  As per the above url,  After running a major compaction, automatic minor
  compactions are no longer triggered, frequently requiring you to manually
  run major compactions on a routine basis. ( Just before the heading
 Tuning
  Column Family compression in the above link)
 
  With respect to the replies below :
 
 
  it creates one big file, which will not be compacted until there are (by
  default) 3 other very big files.
  This is for the minor compaction and major compaction should
 theoretically
  result in one large file irrespective of the number of data files
 initially?
 
 This is not something you have to worry about. Unless you are seeing
  1,000's of files using the default compaction.
 
  Well my worry has been because of the large amount of node movements we
 have
  done in the ring. We started off with 6 nodes and increased the capacity
 to
  12 with disproportionate increases every time which resulted in a lot of
  clean of data folders except system, run repair and then a cleanup with
 an
  aborted attempt in between.
 
  There were some data.db files older by more than 2 weeks and were not
  modified since then. My understanding of the compaction process was that
  since data files keep continuously merging we should not have data files
  with very old last modified timestamps (assuming there is a good amount
 of
  writes to the table continuously) I did not have a for sure way of
 telling
  if everything is alright with the compaction looking at the last modified
  timestamps of all the data.db files.
 
 What are the compaction issues you are having ?
  Your replies confirm that the timestamps should not be an issue to worry
  about. So I guess I should not be calling them as issues any more.  But
  performing an upgradesstables did decrease the number of data files and
  removed all the data files with the old timestamps.
 
 
 
  Regards,
  Ananth
 
 
  On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
  wrote:
 
  As per datastax documentation, a manual compaction forces the admin to
  start compaction manually and disables the automated compaction
 (atleast for
  major compactions but not minor compactions )
 
  It does not disable compaction.
  it creates one big file, which will not be compacted until there are (by
  default) 3 other very big files.
 
 
  1. Does a nodetool stop compaction also force the admin to manually run
  major compaction ( I.e. disable automated major compactions ? )
 
  No.
  Stop just stops the current compaction.
  Nothing is disabled.
 
  2. Can a node restart reset the automated major compaction if a node
 gets
  into a manual mode compaction for whatever reason ?
 
  Major compaction is not automatic. It is the manual nodetool compact
  command.
  Automatic (minor) compaction is controlled by min_compaction_threshold
 and
  max_compaction_threshold (for the default compaction strategy).
 
  3. What is the ideal  number of SSTables for a table in a keyspace ( I
  mean are there any indicators as to whether my compaction is alright or
 not
  ? )
 
  This is not something you have to worry about.
  Unless you are seeing 1,000's of files using the default compaction.
 
   For example, I have seen SSTables on the disk more than 10 days old
  wherein there were other SSTables belonging to the same table but much
  

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread aaron morton
 upgradetables re-writes every sstable to have the same contents in the
 newest format.
Agree. 
 In the world of compaction, and excluding upgrades, have older sstables is 
expected.

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 21/11/2012, at 11:45 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote:
 My understanding of the compaction process was that since data files keep
 continuously merging we should not have data files with very old last
 modified timestamps
 
 It is perfectly OK to have very old SSTables.
 
 But performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.
 
 upgradetables re-writes every sstable to have the same contents in the
 newest format.
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 19/11/2012, at 4:57 PM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:
 
 Hello Aaron,
 
 Thanks a lot for the reply.
 
 Looks like the documentation is confusing. Here is the link I am referring
 to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
 
 
 It does not disable compaction.
 As per the above url,  After running a major compaction, automatic minor
 compactions are no longer triggered, frequently requiring you to manually
 run major compactions on a routine basis. ( Just before the heading Tuning
 Column Family compression in the above link)
 
 With respect to the replies below :
 
 
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.
 This is for the minor compaction and major compaction should theoretically
 result in one large file irrespective of the number of data files initially?
 
 This is not something you have to worry about. Unless you are seeing
 1,000's of files using the default compaction.
 
 Well my worry has been because of the large amount of node movements we have
 done in the ring. We started off with 6 nodes and increased the capacity to
 12 with disproportionate increases every time which resulted in a lot of
 clean of data folders except system, run repair and then a cleanup with an
 aborted attempt in between.
 
 There were some data.db files older by more than 2 weeks and were not
 modified since then. My understanding of the compaction process was that
 since data files keep continuously merging we should not have data files
 with very old last modified timestamps (assuming there is a good amount of
 writes to the table continuously) I did not have a for sure way of telling
 if everything is alright with the compaction looking at the last modified
 timestamps of all the data.db files.
 
 What are the compaction issues you are having ?
 Your replies confirm that the timestamps should not be an issue to worry
 about. So I guess I should not be calling them as issues any more.  But
 performing an upgradesstables did decrease the number of data files and
 removed all the data files with the old timestamps.
 
 
 
 Regards,
 Ananth
 
 
 On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.com
 wrote:
 
 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast for
 major compactions but not minor compactions )
 
 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.
 
 
 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )
 
 No.
 Stop just stops the current compaction.
 Nothing is disabled.
 
 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?
 
 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction strategy).
 
 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )
 
 This is not something you have to worry about.
 Unless you are seeing 1,000's of files using the default compaction.
 
 For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the same table but much
 younger than the older SSTables (
 
 No problems.
 
 4. Does a upgradesstables fix any compaction issues ?
 
 What are the compaction issues you are having ?
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com

Re: Query regarding SSTable timestamps and counts

2012-11-19 Thread Rob Coli
On Sun, Nov 18, 2012 at 7:57 PM, Ananth Gundabattula
agundabatt...@gmail.com wrote:
 As per the above url,  After running a major compaction, automatic minor
 compactions are no longer triggered, frequently requiring you to manually
 run major compactions on a routine basis. ( Just before the heading Tuning
 Column Family compression in the above link)

This inaccurate statement has been questioned a few times on the
mailing list. Generally what happens is people discuss it for about 10
emails and then give up because they can't really make sense of it. If
you google for cassandra-user and that sentence above, you should find
the threads. I suggest mailing d...@datastax.com, explaining your
confusion, and asking them to fix it.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Query regarding SSTable timestamps and counts

2012-11-18 Thread aaron morton
 As per datastax documentation, a manual compaction forces the admin to start 
 compaction manually and disables the automated compaction (atleast for major 
 compactions but not minor compactions )
It does not disable compaction. 
it creates one big file, which will not be compacted until there are (by 
default) 3 other very big files. 


 1. Does a nodetool stop compaction also force the admin to manually run major 
 compaction ( I.e. disable automated major compactions ? ) 
No. 
Stop just stops the current compaction. 
Nothing is disabled. 

 2. Can a node restart reset the automated major compaction if a node gets 
 into a manual mode compaction for whatever reason ? 
Major compaction is not automatic. It is the manual nodetool compact command. 
Automatic (minor) compaction is controlled by min_compaction_threshold and 
max_compaction_threshold (for the default compaction strategy).

 3. What is the ideal  number of SSTables for a table in a keyspace ( I mean 
 are there any indicators as to whether my compaction is alright or not ? )  
This is not something you have to worry about. 
Unless you are seeing 1,000's of files using the default compaction. 

  For example, I have seen SSTables on the disk more than 10 days old wherein 
 there were other SSTables belonging to the same table but much younger than 
 the older SSTables (
No problems. 

 4. Does a upgradesstables fix any compaction issues ? 
What are the compaction issues you are having ? 


Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com wrote:

 
 We have a cluster  running cassandra 1.1.4. On this cluster, 
 
 1. We had to move the nodes around a bit  when we were adding new nodes 
 (there was quite a good amount of node movement ) 
 
 2. We had to stop compactions during some of the days to save some disk  
 space on some of the nodes when they were running very very low on disk 
 spaces. (via nodetool stop COMPACTION)  
 
 
 As per datastax documentation, a manual compaction forces the admin to start 
 compaction manually and disables the automated compaction (atleast for major 
 compactions but not minor compactions )
 
 
 Here are the questions I have regarding compaction: 
 
 1. Does a nodetool stop compaction also force the admin to manually run major 
 compaction ( I.e. disable automated major compactions ? ) 
 
 2. Can a node restart reset the automated major compaction if a node gets 
 into a manual mode compaction for whatever reason ? 
 
 3. What is the ideal  number of SSTables for a table in a keyspace ( I mean 
 are there any indicators as to whether my compaction is alright or not ? )  . 
 For example, I have seen SSTables on the disk more than 10 days old wherein 
 there were other SSTables belonging to the same table but much younger than 
 the older SSTables ( The node movement and repair and cleanup happened 
 between the older SSTables and the new SSTables being touched/modified)
 
 4. Does a upgradesstables fix any compaction issues ? 
 
 Regards,
 Ananth



Re: Query regarding SSTable timestamps and counts

2012-11-18 Thread Ananth Gundabattula
Hello Aaron,

Thanks a lot for the reply.

Looks like the documentation is confusing. Here is the link I am referring
to:  http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction


 It does not disable compaction.
As per the above url,  After running a major compaction, automatic minor
compactions are no longer triggered, frequently requiring you to manually
run major compactions on a routine basis. ( Just before the heading Tuning
Column Family compression in the above link)

With respect to the replies below :


 it creates one big file, which will not be compacted until there are (by
default) 3 other very big files.
This is for the minor compaction and major compaction
should theoretically result in one large file irrespective of the number of
data files initially?

This is not something you have to worry about. Unless you are seeing
1,000's of files using the default compaction.

Well my worry has been because of the large amount of node movements we
have done in the ring. We started off with 6 nodes and increased the
capacity to 12 with disproportionate increases every time which resulted in
a lot of clean of data folders except system, run repair and then a cleanup
with an aborted attempt in between.

There were some data.db files older by more than 2 weeks and were not
modified since then. My understanding of the compaction process was that
since data files keep continuously merging we should not have data files
with very old last modified timestamps (assuming there is a good amount of
writes to the table continuously) I did not have a for sure way of telling
if everything is alright with the compaction looking at the last modified
timestamps of all the data.db files.

What are the compaction issues you are having ?
Your replies confirm that the timestamps should not be an issue to worry
about. So I guess I should not be calling them as issues any more.  But
performing an upgradesstables did decrease the number of data files and
removed all the data files with the old timestamps.



Regards,
Ananth


On Mon, Nov 19, 2012 at 6:54 AM, aaron morton aa...@thelastpickle.comwrote:

 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast
 for major compactions but not minor compactions )

 It does not disable compaction.
 it creates one big file, which will not be compacted until there are (by
 default) 3 other very big files.


 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 No.
 Stop just stops the current compaction.
 Nothing is disabled.

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 Major compaction is not automatic. It is the manual nodetool compact
 command.
 Automatic (minor) compaction is controlled by min_compaction_threshold and
 max_compaction_threshold (for the default compaction strategy).

 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )

 This is not something you have to worry about.
 Unless you are seeing 1,000's of files using the default compaction.

  For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the same table but much
 younger than the older SSTables (

 No problems.

 4. Does a upgradesstables fix any compaction issues ?

 What are the compaction issues you are having ?


 Cheers

 -
 Aaron Morton
 Freelance Cassandra Developer
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 18/11/2012, at 1:18 AM, Ananth Gundabattula agundabatt...@gmail.com
 wrote:


 We have a cluster  running cassandra 1.1.4. On this cluster,

 1. We had to move the nodes around a bit  when we were adding new nodes
 (there was quite a good amount of node movement )

 2. We had to stop compactions during some of the days to save some disk
  space on some of the nodes when they were running very very low on disk
 spaces. (via nodetool stop COMPACTION)


 As per datastax documentation, a manual compaction forces the admin to
 start compaction manually and disables the automated compaction (atleast
 for major compactions but not minor compactions )


 Here are the questions I have regarding compaction:

 1. Does a nodetool stop compaction also force the admin to manually run
 major compaction ( I.e. disable automated major compactions ? )

 2. Can a node restart reset the automated major compaction if a node gets
 into a manual mode compaction for whatever reason ?

 3. What is the ideal  number of SSTables for a table in a keyspace ( I
 mean are there any indicators as to whether my compaction is alright or not
 ? )  . For example, I have seen SSTables on the disk more than 10 days old
 wherein there were other SSTables belonging to the