[
https://issues.apache.org/jira/browse/CASSANDRA-9291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660844#comment-14660844
]
Ben Bromhead commented on CASSANDRA-9291:
-----------------------------------------
It worked for us but took a little bit of doing it a few times on the impacted
nodes until we got them to stream from the right nodes. You need to have more
good nodes than broken ones in this case (or at least one).
resetLocalSchema() in the MigrationManager picks the first node from
Gossiper.instance.getLiveMembers(), so if you have enough (or at least one)
good node in your cluster you can eventually get to a state of normality.
I think you can get the liveMembers list via JMX (correct me if I'm wrong here)
to help figure out which node it will use, otherwise you can roll through the
whole cluster excluding the known good ones.
> Too many tombstones in schema_columns from creating too many CFs
> ----------------------------------------------------------------
>
> Key: CASSANDRA-9291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9291
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Production Cluster with 2 DCs of 3 nodes each and 1 DC
> of 7 nodes, running on dedicated Xeon hexacore, 96GB ram, RAID for Data and
> SSF for commitlog, running Debian 7 (with Java 1.7.0_76-b13 64-Bit, 8GB and
> 16GB of heap tested).
> Dev Cluster with 1 DC with 3 nodes and 1 DC with 1 node, running on
> virtualized env., Ubuntu 12.04.5 (with Java 1.7.0_72-b14 64-Bit 1GB, 4GB
> heap)
> Reporter: Luis Correia
> Priority: Blocker
> Attachments: after_schema.txt, before_schema.txt, schemas500.cql
>
>
> When creating lots of columnfamilies (about 200) the system.schema_columns
> gets filled with tombstones and therefore prevents clients using the binary
> protocol of connecting.
> Clients already connected continue normal operation (reading and inserting).
> Log messages are:
> For the first tries (sorry for the lack of precision):
> bq. ERROR [main] 2015-04-22 00:01:38,527 SliceQueryFilter.java (line 200)
> Scanned over 100000 tombstones in system.schema_columns; query aborted (see
> tombstone_failure_threshold)
> For each client that tries to connect but fails with timeout:
> bq. WARN [ReadStage:35] 2015-04-27 15:40:10,600 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:40] 2015-04-27 15:40:10,609 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:61] 2015-04-27 15:40:10,670 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:51] 2015-04-27 15:40:10,670 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:55] 2015-04-27 15:40:10,675 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:35] 2015-04-27 15:40:10,707 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:40] 2015-04-27 15:40:10,708 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:43] 2015-04-27 15:40:10,715 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:51] 2015-04-27 15:40:10,736 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:61] 2015-04-27 15:40:10,736 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:35] 2015-04-27 15:40:10,750 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:40] 2015-04-27 15:40:10,751 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:55] 2015-04-27 15:40:10,759 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:51] 2015-04-27 15:40:10,821 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:61] 2015-04-27 15:40:10,822 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:43] 2015-04-27 15:40:10,827 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:55] 2015-04-27 15:40:10,838 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:62] 2015-04-27 15:40:10,846 SliceQueryFilter.java (line
> 231) Read 395 live and 1217 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147283441 columns was requested, slices=[-]
> bq. WARN [ReadStage:43] 2015-04-27 15:40:10,862 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> bq. WARN [ReadStage:62] 2015-04-27 15:40:10,898 SliceQueryFilter.java (line
> 231) Read 1146 live and 3534 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147282894 columns was requested, slices=[-]
> bq. WARN [ReadStage:62] 2015-04-27 15:40:10,970 SliceQueryFilter.java (line
> 231) Read 864 live and 2664 tombstoned cells in system.schema_columns (see
> tombstone_warn_threshold). 2147281748 columns was requested, slices=[-]
> This happens independently of values in:
> tombstone_warn_threshold, tombstone_failure_threshold
> Thrift doesn't seem to be vulnerable to this, as cqlsh connects every time.
> Binary protocol tested with Java (and Clojure), Python, C++ all fail to
> connect.
> Included in attachment are a cql script to replicate (with 500 CFs) and a
> sstablemetadata output of a system.schema_columns sstable before (clean
> cluster) and after importing the schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)