[
https://issues.apache.org/jira/browse/CASSANDRA-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526656#comment-14526656
]
Luis Correia commented on CASSANDRA-9291:
-----------------------------------------
I cannot see this as a Minor issue as it can bring down an healthy Cluster just
be creating few more new CFs.
Please mind that in a new Cluster the problem can be mitigating by remodeling
your data.
In an existing Cluster (mind Production!) you can _prevent clients from
connecting just by creating new CFs_. And the solution will be to wait for the
gc_grace_period hoping the right amount of tombstones will be removed by
compaction (that's at least 7 days).
I've tried to import the stables to a new cluster and shift the time (skip to
the day compaction should clear the tombstones), various schema import/export
gymnastics, etc.
Nothing worked. I had to delete system.schema_columns and re-create only the
right amount of CF's (fixing the token range for each node) in order to get my
clients connecting again.
> Too many tombstones in schema_columns from creating too many CFs
> ----------------------------------------------------------------
>
> Key: CASSANDRA-9291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9291
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Production Cluster with 2 DCs of 3 nodes each and 1 DC
> of 7 nodes, running on dedicated Xeon hexacore, 96GB ram, RAID for Data and
> SSF for commitlog, running Debian 7 (with Java 1.7.0_76-b13 64-Bit, 8GB and
> 16GB of heap tested).
> Dev Cluster with 1 DC with 3 nodes and 1 DC with 1 node, running on
> virtualized env., Ubuntu 12.04.5 (with Java 1.7.0_72-b14 64-Bit 1GB, 4GB
> heap)
> Reporter: Luis Correia
> Priority: Blocker
> Attachments: after_schema.txt, before_schema.txt, schemas500.cql
>
>
> When creating lots of columnfamilies (about 200) the system.schema_columns
> gets filled with tombstones and therefore prevents clients using the binary
> protocol of connecting.
> Clients already connected continue normal operation (reading and inserting).
> Log messages are:
> For the first tries (sorry for the lack of precision):
> bq. ERROR [main] 2015-04-22 00:01:38,527 SliceQueryFilter.java (line 200)
> Scanned over 100000 tombstones in system.schema_columns; query aborted (see
> tombstone_failure_threshold)
> For each client that tries to connect but fails with timeout:
> bq. WARN [ReadStage:35] 2015-04-27 15:40:10,600 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:40] 2015-04-27 15:40:10,609 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:61] 2015-04-27 15:40:10,670 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:51] 2015-04-27 15:40:10,670 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:55] 2015-04-27 15:40:10,675 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:35] 2015-04-27 15:40:10,707 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:40] 2015-04-27 15:40:10,708 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:43] 2015-04-27 15:40:10,715 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:51] 2015-04-27 15:40:10,736 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:61] 2015-04-27 15:40:10,736 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:35] 2015-04-27 15:40:10,750 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:40] 2015-04-27 15:40:10,751 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:55] 2015-04-27 15:40:10,759 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:51] 2015-04-27 15:40:10,821 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:61] 2015-04-27 15:40:10,822 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:43] 2015-04-27 15:40:10,827 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:55] 2015-04-27 15:40:10,838 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:62] 2015-04-27 15:40:10,846 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:43] 2015-04-27 15:40:10,862 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:62] 2015-04-27 15:40:10,898 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:62] 2015-04-27 15:40:10,970 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> This happens independently of values in:
> tombstone_warn_threshold, tombstone_failure_threshold
> Thrift doesn't seem to be vulnerable to this, as cqlsh connects every time.
> Binary protocol tested with Java (and Clojure), Python, C++ all fail to
> connect.
> Included in attachment are a cql script to replicate (with 500 CFs) and a
> sstablemetadata output of a system.schema_columns sstable before (clean
> cluster) and after importing the schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)