Re: Query regarding SSTable timestamps and counts

Ananth Gundabattula Tue, 20 Nov 2012 15:20:30 -0800

Thanks a lot Aaron and Edward.

The mail thread clarifies some things for me.


For letting others know on this thread, running an upgradesstables did
decrease our bloom filter false positive ratios a lot. ( upgradesstables
was run not to upgrade from a casasndra version to a higher cassandra
version but because of all the node movement we had done to "upgrade our
cluster in a staggered way with aborted attempts in between" and I
understand that upgradesstables was not necessarily required for the high
bloom filter false positives rates we were seeing )


Regards,
Ananth


On Wed, Nov 21, 2012 at 9:45 AM, Edward Capriolo <edlinuxg...@gmail.com>wrote:

> On Tue, Nov 20, 2012 at 5:23 PM, aaron morton <aa...@thelastpickle.com>
> wrote:
> > My understanding of the compaction process was that since data files keep
> > continuously merging we should not have data files with very old last
> > modified timestamps
> >
> > It is perfectly OK to have very old SSTables.
> >
> > But performing an upgradesstables did decrease the number of data files
> and
> > removed all the data files with the old timestamps.
> >
> > upgradetables re-writes every sstable to have the same contents in the
> > newest format.
> >
> > Cheers
> >
> > -----------------
> > Aaron Morton
> > Freelance Cassandra Developer
> > New Zealand
> >
> > @aaronmorton
> > http://www.thelastpickle.com
> >
> > On 19/11/2012, at 4:57 PM, Ananth Gundabattula <agundabatt...@gmail.com>
> > wrote:
> >
> > Hello Aaron,
> >
> > Thanks a lot for the reply.
> >
> > Looks like the documentation is confusing. Here is the link I am
> referring
> > to:
> http://www.datastax.com/docs/1.1/operations/tuning#tuning-compaction
> >
> >
> >> It does not disable compaction.
> > As per the above url, " After running a major compaction, automatic minor
> > compactions are no longer triggered, frequently requiring you to manually
> > run major compactions on a routine basis." ( Just before the heading
> Tuning
> > Column Family compression in the above link)
> >
> > With respect to the replies below :
> >
> >
> >> it creates one big file, which will not be compacted until there are (by
> >> default) 3 other very big files.
> > This is for the minor compaction and major compaction should
> theoretically
> > result in one large file irrespective of the number of data files
> initially?
> >
> >>This is not something you have to worry about. Unless you are seeing
> >> 1,000's of files using the default compaction.
> >
> > Well my worry has been because of the large amount of node movements we
> have
> > done in the ring. We started off with 6 nodes and increased the capacity
> to
> > 12 with disproportionate increases every time which resulted in a lot of
> > clean of data folders except system, run repair and then a cleanup with
> an
> > aborted attempt in between.
> >
> > There were some data.db files older by more than 2 weeks and were not
> > modified since then. My understanding of the compaction process was that
> > since data files keep continuously merging we should not have data files
> > with very old last modified timestamps (assuming there is a good amount
> of
> > writes to the table continuously) I did not have a for sure way of
> telling
> > if everything is alright with the compaction looking at the last modified
> > timestamps of all the data.db files.
> >
> >>What are the compaction issues you are having ?
> > Your replies confirm that the timestamps should not be an issue to worry
> > about. So I guess I should not be calling them as issues any more.  But
> > performing an upgradesstables did decrease the number of data files and
> > removed all the data files with the old timestamps.
> >
> >
> >
> > Regards,
> > Ananth
> >
> >
> > On Mon, Nov 19, 2012 at 6:54 AM, aaron morton <aa...@thelastpickle.com>
> > wrote:
> >>
> >> As per datastax documentation, a manual compaction forces the admin to
> >> start compaction manually and disables the automated compaction
> (atleast for
> >> major compactions but not minor compactions )
> >>
> >> It does not disable compaction.
> >> it creates one big file, which will not be compacted until there are (by
> >> default) 3 other very big files.
> >>
> >>
> >> 1. Does a nodetool stop compaction also force the admin to manually run
> >> major compaction ( I.e. disable automated major compactions ? )
> >>
> >> No.
> >> Stop just stops the current compaction.
> >> Nothing is disabled.
> >>
> >> 2. Can a node restart reset the automated major compaction if a node
> gets
> >> into a manual mode compaction for whatever reason ?
> >>
> >> Major compaction is not automatic. It is the manual nodetool compact
> >> command.
> >> Automatic (minor) compaction is controlled by min_compaction_threshold
> and
> >> max_compaction_threshold (for the default compaction strategy).
> >>
> >> 3. What is the ideal  number of SSTables for a table in a keyspace ( I
> >> mean are there any indicators as to whether my compaction is alright or
> not
> >> ? )
> >>
> >> This is not something you have to worry about.
> >> Unless you are seeing 1,000's of files using the default compaction.
> >>
> >>  For example, I have seen SSTables on the disk more than 10 days old
> >> wherein there were other SSTables belonging to the same table but much
> >> younger than the older SSTables (
> >>
> >> No problems.
> >>
> >> 4. Does a upgradesstables fix any compaction issues ?
> >>
> >> What are the compaction issues you are having ?
> >>
> >>
> >> Cheers
> >>
> >> -----------------
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> New Zealand
> >>
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >>
> >> On 18/11/2012, at 1:18 AM, Ananth Gundabattula <agundabatt...@gmail.com
> >
> >> wrote:
> >>
> >>
> >> We have a cluster  running cassandra 1.1.4. On this cluster,
> >>
> >> 1. We had to move the nodes around a bit  when we were adding new nodes
> >> (there was quite a good amount of node movement )
> >>
> >> 2. We had to stop compactions during some of the days to save some disk
> >> space on some of the nodes when they were running very very low on disk
> >> spaces. (via nodetool stop COMPACTION)
> >>
> >>
> >> As per datastax documentation, a manual compaction forces the admin to
> >> start compaction manually and disables the automated compaction
> (atleast for
> >> major compactions but not minor compactions )
> >>
> >>
> >> Here are the questions I have regarding compaction:
> >>
> >> 1. Does a nodetool stop compaction also force the admin to manually run
> >> major compaction ( I.e. disable automated major compactions ? )
> >>
> >> 2. Can a node restart reset the automated major compaction if a node
> gets
> >> into a manual mode compaction for whatever reason ?
> >>
> >> 3. What is the ideal  number of SSTables for a table in a keyspace ( I
> >> mean are there any indicators as to whether my compaction is alright or
> not
> >> ? )  . For example, I have seen SSTables on the disk more than 10 days
> old
> >> wherein there were other SSTables belonging to the same table but much
> >> younger than the older SSTables ( The node movement and repair and
> cleanup
> >> happened between the older SSTables and the new SSTables being
> >> touched/modified)
> >>
> >> 4. Does a upgradesstables fix any compaction issues ?
> >>
> >> Regards,
> >> Ananth
> >>
> >>
> >
> >
>
> "it is perfectly OK to have old sstables."
>
> Except for the fact that you can not repair and join new nodes until
> the cluster is on all on the same version all on the same files.
>
> Your gc_grace_time defaults to 10 days. This means that if you don't
> repair every node every 10 days something wonky can happen if you do
> deletes.
>
> Also in the past there was an issue if you upgraded from 0.8.X to
> 1.0.X. 1.0.X did not read some 0.8.X bloom filter files correctly. So
> you could get bad reads until you upgraded tables.
>
> These factors cause me to upgrade sstables as soon as possible after an
> upgrade.
>

Re: Query regarding SSTable timestamps and counts

Reply via email to