Re: secondary index table - tombstones surviving compactions

Jordan West Wed, 30 May 2018 02:41:05 -0700

Hi Roman,

I was able to reproduce the issue you described. I filed
https://issues.apache.org/jira/browse/CASSANDRA-14479. More details there.


Thanks for reporting!
Jordan


On Wed, May 23, 2018 at 12:06 AM, Roman Bielik <
roman.bie...@openmindnetworks.com> wrote:

> Hi,
>
> I apologise for a late response I wanted to run some further tests so I can
> provide more information to you.
>
> @Jeff, no I don't set the "only_purge_repaired_tombstone" option. It
> should
> be default: False.
> But no I don't run repairs during the tests.
>
> @Eric, I understand that rapid deletes/inserts are some kind of
> antipattern, nevertheless I'm not experiencing any problems with that
> (except for the 2nd indices).
>
> Update: I run a new test where I delete the indexed columns extra, plus
> delete the whole row at the end.
> And surprisingly this test scenario works fine. Using nodetool flush +
> compact (in order to expedite the test) seems to always purge the index
> table.
> So that's great because I seem to have found a workaround, on the other
> hand, could there be a bug in Cassandra - leaking index table?
>
> Test details:
> Create table with LeveledCompactionStrategy;
> 'tombstone_compaction_interval': 60; gc_grace_seconds=60
> There are two indexed columns for comparison: column1, column2
> Insert keys {1..x} with random values in column1 & column2
> Delete {key:column2}     (but not column1)
> Delete {key}
> Repeat n-times from the inserts
> Wait 1 minute
> nodetool flush
> nodetool compact (sometimes compact <keyspace> <table.index>
> nodetool cfstats
>
> What I observe is, that the data table is empty, column2 index table is
> also empty and column1 index table has non-zero (leaked) "space used" and
> "estimated rows".
>
> Roman
>
>
>
>
>
>
> On 18 May 2018 at 16:13, Jeff Jirsa <jji...@gmail.com> wrote:
>
> > This would matter for the base table, but would be less likely for the
> > secondary index, where the partition key is the value of the base row
> >
> > Roman: there’s a config option related to only purging repaired
> tombstones
> > - do you have that enabled ? If so, are you running repairs?
> >
> > --
> > Jeff Jirsa
> >
> >
> > > On May 18, 2018, at 6:41 AM, Eric Stevens <migh...@gmail.com> wrote:
> > >
> > > The answer to Question 3 is "yes."  One of the more subtle points about
> > > tombstones is that Cassandra won't remove them during compaction if
> there
> > > is a bloom filter on any SSTable on that replica indicating that it
> > > contains the same partition (not primary) key.  Even if it is older
> than
> > > gc_grace, and would otherwise be a candidate for cleanup.
> > >
> > > If you're recycling partition keys, your tombstones may never be able
> to
> > be
> > > cleaned up, because in this scenario there is a high probability that
> an
> > > SSTable not involved in that compaction also contains the same
> partition
> > > key, and so compaction cannot have confidence that it's safe to remove
> > the
> > > tombstone (it would have to fully materialize every record in the
> > > compaction, which is too expensive).
> > >
> > > In general it is an antipattern in Cassandra to write to a given
> > partition
> > > indefinitely for this and other reasons.
> > >
> > > On Fri, May 18, 2018 at 2:37 AM Roman Bielik <
> > > roman.bie...@openmindnetworks.com> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have a Cassandra 3.11 table (with compact storage) and using
> secondary
> > >> indices with rather unique data stored in the indexed columns. There
> are
> > >> many inserts and deletes, so in order to avoid tombstones piling up
> I'm
> > >> re-using primary keys from a pool (which works fine).
> > >> I'm aware that this design pattern is not ideal, but for now I can not
> > >> change it easily.
> > >>
> > >> The problem is, the size of 2nd index tables keeps growing (filled
> with
> > >> tombstones) no matter what.
> > >>
> > >> I tried some aggressive configuration (just for testing) in order to
> > >> expedite the tombstone removal but with little-to-zero effect:
> > >> COMPACTION = { 'class':
> > >> 'LeveledCompactionStrategy', 'unchecked_tombstone_compaction':
> 'true',
> > >> 'tombstone_compaction_interval': 600 }
> > >> gc_grace_seconds = 600
> > >>
> > >> I'm aware that perhaps Materialized views could provide a solution to
> > this,
> > >> but I'm bind to the Thrift interface, so can not use them.
> > >>
> > >> Questions:
> > >> 1. Is there something I'm missing? How come compaction does not remove
> > the
> > >> obsolete indices/tombstones from 2nd index tables? Can I trigger the
> > >> cleanup manually somehow?
> > >> I have tried nodetool flush, compact, rebuild_index on both data table
> > and
> > >> internal Index table, but with no result.
> > >>
> > >> 2. When deleting a record I'm deleting the whole row at once - which
> > would
> > >> create one tombstone for the whole record if I'm correct. Would it
> help
> > to
> > >> delete the indexed columns separately creating extra tombstone for
> each
> > >> cell?
> > >> As I understand the underlying mechanism, the indexed column value
> must
> > be
> > >> read in order a proper tombstone for the index is created for it.
> > >>
> > >> 3. Could the fact that I'm reusing the primary key of a deleted record
> > >> shortly for a new insert interact with the secondary index tombstone
> > >> removal?
> > >>
> > >> Will be grateful for any advice.
> > >>
> > >> Regards,
> > >> Roman
> > >>
> > >> --
> > >> <http://www.openmindnetworks.com>
> > >> <http://www.openmindnetworks.com/>
> > >> <https://www.linkedin.com/company/openmind-networks>
> > >> <https://twitter.com/Openmind_Ntwks>  <http://www.openmindnetworks.
> com/
> > >
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
> >
>
> --
>  <http://www.openmindnetworks.com>
>  <http://www.openmindnetworks.com/>
> <https://www.linkedin.com/company/openmind-networks>
> <https://twitter.com/Openmind_Ntwks>  <http://www.openmindnetworks.com/>
>

Re: secondary index table - tombstones surviving compactions

Reply via email to