Thanks a lot Aaron and Edward. The mail thread clarifies some things for me.
For letting others know on this thread, running an upgradesstables did decrease our bloom filter false positive ratios a lot. ( upgradesstables was run not to upgrade from a casasndra version to a higher cassandra version but because of all the node movement we had done to "upgrade our cluster in a staggered way with aborted attempts in between" and I understand that upgradesstables was not necessarily required for the high bloom filter false positives rates we were seeing ) Regards, Ananth On Wed, Nov 21, 2012 at 9:45 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote: > On Tue, Nov 20, 2012 at 5:23 PM, aaron morton <aa...@thelastpickle.com> > wrote: > > My understanding of the compaction process was that since data files keep > > continuously merging we should not have data files with very old last > > modified timestamps > > > > It is perfectly OK to have very old SSTables. > > > > But performing an upgradesstables did decrease the number of data files > and > > removed all the data files with the old timestamps. > > > > upgradetables re-writes every sstable to have the same contents in the > > newest format. > > > > Cheers > > > > ----------------- > > Aaron Morton > > Freelance Cassandra Developer > > New Zealand > > > > @aaronmorton > > http://www.thelastpickle.com > > > > On 19/11/2012, at 4:57 PM, Ananth Gundabattula <agundabatt...@gmail.com> > > wrote: > > > > Hello Aaron, > > > > Thanks a lot for the reply. > > > > Looks like the documentation is confusing. Here is the link I am > referring > > to: > http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction > > > > > >> It does not disable compaction. > > As per the above url, " After running a major compaction, automatic minor > > compactions are no longer triggered, frequently requiring you to manually > > run major compactions on a routine basis." ( Just before the heading > Tuning > > Column Family compression in the above link) > > > > With respect to the replies below : > > > > > >> it creates one big file, which will not be compacted until there are (by > >> default) 3 other very big files. > > This is for the minor compaction and major compaction should > theoretically > > result in one large file irrespective of the number of data files > initially? > > > >>This is not something you have to worry about. Unless you are seeing > >> 1,000's of files using the default compaction. > > > > Well my worry has been because of the large amount of node movements we > have > > done in the ring. We started off with 6 nodes and increased the capacity > to > > 12 with disproportionate increases every time which resulted in a lot of > > clean of data folders except system, run repair and then a cleanup with > an > > aborted attempt in between. > > > > There were some data.db files older by more than 2 weeks and were not > > modified since then. My understanding of the compaction process was that > > since data files keep continuously merging we should not have data files > > with very old last modified timestamps (assuming there is a good amount > of > > writes to the table continuously) I did not have a for sure way of > telling > > if everything is alright with the compaction looking at the last modified > > timestamps of all the data.db files. > > > >>What are the compaction issues you are having ? > > Your replies confirm that the timestamps should not be an issue to worry > > about. So I guess I should not be calling them as issues any more. But > > performing an upgradesstables did decrease the number of data files and > > removed all the data files with the old timestamps. > > > > > > > > Regards, > > Ananth > > > > > > On Mon, Nov 19, 2012 at 6:54 AM, aaron morton <aa...@thelastpickle.com> > > wrote: > >> > >> As per datastax documentation, a manual compaction forces the admin to > >> start compaction manually and disables the automated compaction > (atleast for > >> major compactions but not minor compactions ) > >> > >> It does not disable compaction. > >> it creates one big file, which will not be compacted until there are (by > >> default) 3 other very big files. > >> > >> > >> 1. Does a nodetool stop compaction also force the admin to manually run > >> major compaction ( I.e. disable automated major compactions ? ) > >> > >> No. > >> Stop just stops the current compaction. > >> Nothing is disabled. > >> > >> 2. Can a node restart reset the automated major compaction if a node > gets > >> into a manual mode compaction for whatever reason ? > >> > >> Major compaction is not automatic. It is the manual nodetool compact > >> command. > >> Automatic (minor) compaction is controlled by min_compaction_threshold > and > >> max_compaction_threshold (for the default compaction strategy). > >> > >> 3. What is the ideal number of SSTables for a table in a keyspace ( I > >> mean are there any indicators as to whether my compaction is alright or > not > >> ? ) > >> > >> This is not something you have to worry about. > >> Unless you are seeing 1,000's of files using the default compaction. > >> > >> For example, I have seen SSTables on the disk more than 10 days old > >> wherein there were other SSTables belonging to the same table but much > >> younger than the older SSTables ( > >> > >> No problems. > >> > >> 4. Does a upgradesstables fix any compaction issues ? > >> > >> What are the compaction issues you are having ? > >> > >> > >> Cheers > >> > >> ----------------- > >> Aaron Morton > >> Freelance Cassandra Developer > >> New Zealand > >> > >> @aaronmorton > >> http://www.thelastpickle.com > >> > >> On 18/11/2012, at 1:18 AM, Ananth Gundabattula <agundabatt...@gmail.com > > > >> wrote: > >> > >> > >> We have a cluster running cassandra 1.1.4. On this cluster, > >> > >> 1. We had to move the nodes around a bit when we were adding new nodes > >> (there was quite a good amount of node movement ) > >> > >> 2. We had to stop compactions during some of the days to save some disk > >> space on some of the nodes when they were running very very low on disk > >> spaces. (via nodetool stop COMPACTION) > >> > >> > >> As per datastax documentation, a manual compaction forces the admin to > >> start compaction manually and disables the automated compaction > (atleast for > >> major compactions but not minor compactions ) > >> > >> > >> Here are the questions I have regarding compaction: > >> > >> 1. Does a nodetool stop compaction also force the admin to manually run > >> major compaction ( I.e. disable automated major compactions ? ) > >> > >> 2. Can a node restart reset the automated major compaction if a node > gets > >> into a manual mode compaction for whatever reason ? > >> > >> 3. What is the ideal number of SSTables for a table in a keyspace ( I > >> mean are there any indicators as to whether my compaction is alright or > not > >> ? ) . For example, I have seen SSTables on the disk more than 10 days > old > >> wherein there were other SSTables belonging to the same table but much > >> younger than the older SSTables ( The node movement and repair and > cleanup > >> happened between the older SSTables and the new SSTables being > >> touched/modified) > >> > >> 4. Does a upgradesstables fix any compaction issues ? > >> > >> Regards, > >> Ananth > >> > >> > > > > > > "it is perfectly OK to have old sstables." > > Except for the fact that you can not repair and join new nodes until > the cluster is on all on the same version all on the same files. > > Your gc_grace_time defaults to 10 days. This means that if you don't > repair every node every 10 days something wonky can happen if you do > deletes. > > Also in the past there was an issue if you upgraded from 0.8.X to > 1.0.X. 1.0.X did not read some 0.8.X bloom filter files correctly. So > you could get bad reads until you upgraded tables. > > These factors cause me to upgrade sstables as soon as possible after an > upgrade. >