bootstrap failure and strange gossiper state

2015-03-15 Thread Karl Rieb
I am also experiencing issues bootstrapping new nodes in my 2.0.10
Cassandra cluster.  The first attempt to bootstrap ALWAYS fails, followed
by a second bootstrap attempt that ALWAYS succeeds.

The first attempt at bootstrapping fails with:

 INFO [main] 2015-03-15 02:41:02,550 StorageService.java (line 966)
JOINING: Starting to bootstrap...
ERROR [main] 2015-03-15 02:41:02,872 CassandraDaemon.java (line 513)
Exception encountered during startup
java.lang.IllegalStateException: unable to find sufficient sources for
streaming range (7169067280919608187,7171404468239785904]
at
org.apache.cassandra.dht.RangeStreamer.getRangeFetchMap(RangeStreamer.java:201)
at
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:125)
at
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:72)
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:994)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:797)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:502)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)


After the failure, I

   1. stop the node,
   2. clear our the data/saved_caches/commitlog directories, and
   3. remove the node from all peers (usually by manually deleting the node
   from their peers table)
   4. restart the node to re-attempt bootstrap

The bootstrap always seems to work on this second attempt.


I tried comparing the logs from the failed bootstrap and the successful
one, and the main difference I see is that the failed bootstrap contains
many unknown endpoint lines:

 INFO [main] 2015-03-15 02:40:25,160 StorageService.java (line 966)
JOINING: waiting for ring information
 INFO [HANDSHAKE-/10.30.30.30] 2015-03-15 02:40:25,175
OutboundTcpConnection.java (line 386) Handshaking version with /10.30.30.30
 INFO [HANDSHAKE-/10.4.4.4] 2015-03-15 02:40:25,279
OutboundTcpConnection.java (line 386) Handshaking version with /10.4.4.4
 INFO [HANDSHAKE-/10.20.20.20] 2015-03-15 02:40:25,383
OutboundTcpConnection.java (line 386) Handshaking version with /10.20.20.20
 INFO [HANDSHAKE-/10.10.10.10] 2015-03-15 02:40:25,489
OutboundTcpConnection.java (line 386) Handshaking version with /10.10.10.10
 INFO [HANDSHAKE-/10.5.5.5] 2015-03-15 02:40:25,596
OutboundTcpConnection.java (line 386) Handshaking version with /10.5.5.5
 INFO [RequestResponseStage:3] 2015-03-15 02:40:25,700 Gossiper.java (line
876) InetAddress /10.2.2.2 is now UP
ERROR [MigrationStage:1] 2015-03-15 02:40:25,701 FailureDetector.java (line
200) unknown endpoint /10.2.2.2
ERROR [MigrationStage:1] 2015-03-15 02:40:25,701 MigrationTask.java (line
55) Can't send migration request: node /10.2.2.2 is down.
 INFO [RequestResponseStage:4] 2015-03-15 02:40:25,716 Gossiper.java (line
876) InetAddress /10.1.1.1 is now UP
ERROR [MigrationStage:1] 2015-03-15 02:40:25,716 FailureDetector.java (line
200) unknown endpoint /10.1.1.1
ERROR [MigrationStage:1] 2015-03-15 02:40:25,716 MigrationTask.java (line
55) Can't send migration request: node /10.1.1.1 is down.
 INFO [RequestResponseStage:1] 2015-03-15 02:40:25,719 Gossiper.java (line
876) InetAddress /10.3.3.3 is now UP
ERROR [MigrationStage:1] 2015-03-15 02:40:25,720 FailureDetector.java (line
200) unknown endpoint /10.3.3.3
ERROR [MigrationStage:1] 2015-03-15 02:40:25,720 MigrationTask.java (line
55) Can't send migration request: node /10.3.3.3 is down.
 INFO [RequestResponseStage:2] 2015-03-15 02:40:25,739 Gossiper.java (line
876) InetAddress /10.4.4.4 is now UP
ERROR [MigrationStage:1] 2015-03-15 02:40:25,739 FailureDetector.java (line
200) unknown endpoint /10.4.4.4
ERROR [MigrationStage:1] 2015-03-15 02:40:25,740 MigrationTask.java (line
55) Can't send migration request: node /10.4.4.4 is down.
 INFO [RequestResponseStage:3] 2015-03-15 02:40:25,742 Gossiper.java (line
876) InetAddress /10.30.30.30 is now UP
ERROR [MigrationStage:1] 2015-03-15 02:40:25,743 FailureDetector.java (line
200) unknown endpoint /10.30.30.30
ERROR [MigrationStage:1] 2015-03-15 02:40:25,743 MigrationTask.java (line
55) Can't send migration request: node /10.30.30.30 is down.
 INFO [RequestResponseStage:4] 2015-03-15 02:40:25,747 Gossiper.java (line
876) InetAddress /10.20.20.20 is now UP
ERROR [MigrationStage:1] 2015-03-15 02:40:25,747 FailureDetector.java (line
200) unknown endpoint /10.20.20.20
ERROR [MigrationStage:1] 2015-03-15 02:40:25,748 MigrationTask.java (line
55) Can't send migration request: node /10.20.20.20 is down.
 INFO [RequestResponseStage:1] 2015-03-15 02:40:25,823 Gossiper.java (line
876) InetAddress /10.5.5.5 is now UP
ERROR [MigrationStage:1] 2015-03-15 

Re: Cassandra metrics Graphite

2014-12-17 Thread Karl Rieb
This seemed to be due to a bug with how metric names are converted to file 
system paths. os.path.join() is used, but the metric path converts into an 
absolute path (e.g /org/apache/cassandra). This means you end up doing 
something like:

os.path.join('/opt/graphite/storage/whatever', '/org/apache/cassandra/etc')

the metric name gets converted to a path by replacing all dots with slashes. I 
just manually tweaked the Python code to strip any leading dots from the metric 
name as a temporary workaround. 

-Karl



 On Dec 17, 2014, at 11:04 AM, Nigel LEACH nigel.le...@uk.bnpparibas.com 
 wrote:
 
 I'm running Cassandra Cassandra 2.0.11.83 (via DSE 4.6.0), and Graphite 
 0.9.10. I know a bit about Cassandra, but not much about Graphite.
 
 Our Graphite server exposes system metrics, and also those from the example 
 python scripts, successfully.
 
 I can see Cassandra metrics hitting the Graphite server, but in the console 
 log, errors suggest they are attempting to load in to the root file system
 
 exceptions.IOError: [Errno 2] No such file or directory: 
 '/org/apache/cassandra/metrics/ColumnFamily/system/sstable_activity/WriteTotalLatency/count.wsp'
 
 Whereas, I think, it should be going to something like this
 
 /var/lib/carbon/whisper/carbon/agents/org/apache/cassandra/metrics/ColumnFamily/system/sstable_activity/WriteTotalLatency/count.wsp'
 
 I'm losing the prefix directory path somewhere, but don't know where to 
 configure it.
 
 On the Cassandra side all I have added is a call to metricsGraphite.yaml, 
 which contains
 
 graphite:
  -
period: 60
timeunit: 'SECONDS'
hosts:
 - host: '10.11.12.13'
   port: 2003
predicate:
  color: white
  useQualifiedName: true
  patterns:
- ^org.apache.cassandra.metrics.+
 
 On the Graphite side I simply have the following in Carbons' 
 storage-schemas.conf file
 
 [cassandra]
 pattern=cassandra
 retentions = 60:90d
 
 Any hints to what is going wrong?
 
 Many Thanks
 Nigel
 
 
 ___
 This e-mail may contain confidential and/or privileged information. If you 
 are not the intended recipient (or have received this e-mail in error) please 
 notify the sender immediately and delete this e-mail. Any unauthorised 
 copying, disclosure or distribution of the material in this e-mail is 
 prohibited.
 
 Please refer to http://www.bnpparibas.co.uk/en/email-disclaimer/ for 
 additional disclosures.
 


2.0.10 debian/ubuntu apt release?

2014-09-12 Thread Karl Rieb
Hi,

Wondering when 2.0.10 will be available through the datastax apt repository?

-Karl


Re: 2.0.10 debian/ubuntu apt release?

2014-09-12 Thread Karl Rieb
Awesome! Thanks!

-Karl

 On Sep 12, 2014, at 5:34 PM, Michael Shuler mich...@pbandjelly.org wrote:
 
 On 09/12/2014 01:50 PM, Karl Rieb wrote:
 Hi,
 
 Wondering when 2.0.10 will be available through the datastax apt repository?
 
 I'll have 2.0.10 deb/rpm packages in the repos on Monday, barring any issues. 
  You can certainly pull the identical cassandra deb package from the Apache 
 apt repository.  Thanks for your patience!
 
 http://www.apache.org/dist/cassandra/debian/pool/main/c/cassandra/cassandra_2.0.10_all.deb
 
 sources.list entry:
 
 deb http://www.apache.org/dist/cassandra/debian 20x main
 
 Apache Cassandra apt repo key instructions are here:
 
 http://wiki.apache.org/cassandra/DebianPackaging
 
 -- 
 Kind regards,
 Michael


Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-21 Thread Karl Rieb
I did not include unit tests in my patch. I think many people did not run into 
this issue because many Cassandra clients handle the DateType when found as a 
CUSTOM type. 

-Karl

 On Jul 21, 2014, at 8:26 PM, Robert Coli rc...@eventbrite.com wrote:
 
 On Mon, Jul 21, 2014 at 1:58 AM, Ben Hood 0x6e6...@gmail.com wrote:
 On Sat, Jul 19, 2014 at 7:35 PM, Karl Rieb karl.r...@gmail.com wrote:
  Can now be followed at:
  https://issues.apache.org/jira/browse/CASSANDRA-7576.
 
 Nice work! Finally we have a proper solution to this issue, so well done to 
 you.
 
 For reference, I consider this issue of sufficient severity to recommend 
 against upgrading to any version of 2.0 before 2.0.10, unless you are certain 
 you have no such schema.
 
 I'm pretty sure reversed comparator timestamps are a common type of schema, 
 given that there are blog posts recommending their use, so I struggle to 
 understand how this was not detected by unit tests.
 
 Does your fix add unit tests which would catch this case on upgrade?
 
 =Rob
 


Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-19 Thread Karl Rieb
Ben! I think I have an idea of exactly where the bug is!

I did some more searching and discovered the difference that causes some
tables to produce the wrong type and others to be okay: *the tables with
the wrong type reverse the ordering of the timestamp column*.

The bug is in org.apache.cassandra.transport.DataType:fromType(AbstractType)
:

public static PairDataType, Object fromType(AbstractType type)
{
// For CQL3 clients, ReversedType is an implementation detail and
they
// shouldn't have to care about it.
if (type instanceof ReversedType)
type = ((ReversedType)type).baseType;
// For compatibility sake, we still return DateType as the
timestamp type in resultSet metadata (#5723)
else if (type instanceof DateType)
type = TimestampType.instance;

DataType dt = dataTypeMap.get(type);
if (dt == null)
{
if (type.isCollection())
{
if (type instanceof ListType)
{
return Pair.DataType, Objectcreate(LIST,
((ListType)type).elements);
}
else if (type instanceof MapType)
{
MapType mt = (MapType)type;
return Pair.DataType, Objectcreate(MAP,
Arrays.asList(mt.keys, mt.values));
}
else
{
assert type instanceof SetType;
return Pair.DataType, Objectcreate(SET,
((SetType)type).elements);
}
}
return Pair.DataType, Objectcreate(CUSTOM, type.toString());
}
else
{
return Pair.create(dt, null);
}
}

The issue is the else if, which does not check the base type of the
reversed column:

if (type instanceof ReversedType)
type = ((ReversedType)type).baseType;
// For compatibility sake, we still return DateType as the
timestamp type in resultSet metadata (#5723)
*else if* (type instanceof DateType)
type = TimestampType.instance;

The else should be removed to make it just:

if (type instanceof ReversedType)
type = ((ReversedType)type).baseType;
// For compatibility sake, we still return DateType as the
timestamp type in resultSet metadata (#5723)
*if* (type instanceof DateType)
type = TimestampType.instance;

This way we do a check for DataType on the base type of reversed columns!

I applied the fix to my 2.0.9 cassandra node and the errors go away!

Could you guys please make this single-word fix?

-Karl



On Fri, Jul 18, 2014 at 1:30 PM, Ben Hood 0x6e6...@gmail.com wrote:

 On Fri, Jul 18, 2014 at 3:03 PM, Karl Rieb karl.r...@gmail.com wrote:
  Why is the protocol ID correct for some tables but not others?

 I have no idea.

  Why does it work when I do a clean install on a new 2.0.x cluster?

 I still have no idea.

  The bug seems to be on the Cassandra side and the clients seem to just
 be providing patches to these issues.

 It was reported to the Cassandra list, but there was no answer,
 potentially because the query was sent to the wrong list, but I don't
 really know. Maybe it should have gone into Jira, but it's unclear as
 to whether this is a client or a server issue.

 In any case, it didn't look like the server behavior was going to
 change any time soon, so we just took the pragmatic approach in gocql
 and worked around the issue.

  I will post to the Datastax java driver mailing list and see if they are
 willing to add a patch.

 That sounds like a good idea, seeing as the workaround has been tested
 before.

 Sorry to be of little help to you.



Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-19 Thread Karl Rieb
Will do!

 On Jul 19, 2014, at 11:22 AM, Robert Stupp sn...@snazy.de wrote:
 
 Can you submit a ticket in C* JIRA at issues.apache.org?
 
 --
 Sent from my iPhone 
 
 Am 19.07.2014 um 16:45 schrieb Karl Rieb karl.r...@gmail.com:
 
 Ben! I think I have an idea of exactly where the bug is!
 
 I did some more searching and discovered the difference that causes some 
 tables to produce the wrong type and others to be okay: the tables with the 
 wrong type reverse the ordering of the timestamp column. 
 
 The bug is in org.apache.cassandra.transport.DataType:fromType(AbstractType):
 
 public static PairDataType, Object fromType(AbstractType type)
 {
 // For CQL3 clients, ReversedType is an implementation detail and 
 they
 // shouldn't have to care about it.
 if (type instanceof ReversedType)
 type = ((ReversedType)type).baseType;
 // For compatibility sake, we still return DateType as the timestamp 
 type in resultSet metadata (#5723)
 else if (type instanceof DateType)
 type = TimestampType.instance;
 
 DataType dt = dataTypeMap.get(type);
 if (dt == null)
 {
 if (type.isCollection())
 {
 if (type instanceof ListType)
 {
 return Pair.DataType, Objectcreate(LIST, 
 ((ListType)type).elements);
 }
 else if (type instanceof MapType)
 {
 MapType mt = (MapType)type;
 return Pair.DataType, Objectcreate(MAP, 
 Arrays.asList(mt.keys, mt.values));
 }
 else
 {
 assert type instanceof SetType;
 return Pair.DataType, Objectcreate(SET, 
 ((SetType)type).elements);
 }
 }
 return Pair.DataType, Objectcreate(CUSTOM, type.toString());
 }
 else
 {
 return Pair.create(dt, null);
 }
 }
 
 The issue is the else if, which does not check the base type of the 
 reversed column:
 
 if (type instanceof ReversedType)
 type = ((ReversedType)type).baseType;
 // For compatibility sake, we still return DateType as the timestamp 
 type in resultSet metadata (#5723)
 else if (type instanceof DateType)
 type = TimestampType.instance;
 
 The else should be removed to make it just:
 
 if (type instanceof ReversedType)
 type = ((ReversedType)type).baseType;
 // For compatibility sake, we still return DateType as the timestamp 
 type in resultSet metadata (#5723)
 if (type instanceof DateType)
 type = TimestampType.instance;
 
 This way we do a check for DataType on the base type of reversed columns!  
 
 I applied the fix to my 2.0.9 cassandra node and the errors go away!
 
 Could you guys please make this single-word fix?
 
 -Karl
 
 
 
 On Fri, Jul 18, 2014 at 1:30 PM, Ben Hood 0x6e6...@gmail.com wrote:
 On Fri, Jul 18, 2014 at 3:03 PM, Karl Rieb karl.r...@gmail.com wrote:
  Why is the protocol ID correct for some tables but not others?
 
 I have no idea.
 
  Why does it work when I do a clean install on a new 2.0.x cluster?
 
 I still have no idea.
 
  The bug seems to be on the Cassandra side and the clients seem to just be 
  providing patches to these issues.
 
 It was reported to the Cassandra list, but there was no answer,
 potentially because the query was sent to the wrong list, but I don't
 really know. Maybe it should have gone into Jira, but it's unclear as
 to whether this is a client or a server issue.
 
 In any case, it didn't look like the server behavior was going to
 change any time soon, so we just took the pragmatic approach in gocql
 and worked around the issue.
 
  I will post to the Datastax java driver mailing list and see if they are 
  willing to add a patch.
 
 That sounds like a good idea, seeing as the workaround has been tested 
 before.
 
 Sorry to be of little help to you.
 


Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-19 Thread Karl Rieb
Can now be followed at: https://issues.apache.org/jira/browse/CASSANDRA-7576
.


On Sat, Jul 19, 2014 at 1:03 PM, Karl Rieb karl.r...@gmail.com wrote:

 Will do!

 On Jul 19, 2014, at 11:22 AM, Robert Stupp sn...@snazy.de wrote:

 Can you submit a ticket in C* JIRA at issues.apache.org?

 --
 Sent from my iPhone

 Am 19.07.2014 um 16:45 schrieb Karl Rieb karl.r...@gmail.com:

 Ben! I think I have an idea of exactly where the bug is!

 I did some more searching and discovered the difference that causes some
 tables to produce the wrong type and others to be okay: *the tables with
 the wrong type reverse the ordering of the timestamp column*.

 The bug is in
 org.apache.cassandra.transport.DataType:fromType(AbstractType):

 public static PairDataType, Object fromType(AbstractType type)
 {
 // For CQL3 clients, ReversedType is an implementation detail and
 they
 // shouldn't have to care about it.
 if (type instanceof ReversedType)
 type = ((ReversedType)type).baseType;
 // For compatibility sake, we still return DateType as the
 timestamp type in resultSet metadata (#5723)
 else if (type instanceof DateType)
 type = TimestampType.instance;

 DataType dt = dataTypeMap.get(type);
 if (dt == null)
 {
 if (type.isCollection())
 {
 if (type instanceof ListType)
 {
 return Pair.DataType, Objectcreate(LIST,
 ((ListType)type).elements);
 }
 else if (type instanceof MapType)
 {
 MapType mt = (MapType)type;
 return Pair.DataType, Objectcreate(MAP,
 Arrays.asList(mt.keys, mt.values));
 }
 else
 {
 assert type instanceof SetType;
 return Pair.DataType, Objectcreate(SET,
 ((SetType)type).elements);
 }
 }
 return Pair.DataType, Objectcreate(CUSTOM, type.toString());
 }
 else
 {
 return Pair.create(dt, null);
 }
 }

 The issue is the else if, which does not check the base type of the
 reversed column:

 if (type instanceof ReversedType)
 type = ((ReversedType)type).baseType;
 // For compatibility sake, we still return DateType as the
 timestamp type in resultSet metadata (#5723)
 *else if* (type instanceof DateType)
 type = TimestampType.instance;

 The else should be removed to make it just:

 if (type instanceof ReversedType)
 type = ((ReversedType)type).baseType;
 // For compatibility sake, we still return DateType as the
 timestamp type in resultSet metadata (#5723)
 *if* (type instanceof DateType)
 type = TimestampType.instance;

 This way we do a check for DataType on the base type of reversed columns!

 I applied the fix to my 2.0.9 cassandra node and the errors go away!

 Could you guys please make this single-word fix?

 -Karl



 On Fri, Jul 18, 2014 at 1:30 PM, Ben Hood 0x6e6...@gmail.com wrote:

 On Fri, Jul 18, 2014 at 3:03 PM, Karl Rieb karl.r...@gmail.com wrote:
  Why is the protocol ID correct for some tables but not others?

 I have no idea.

  Why does it work when I do a clean install on a new 2.0.x cluster?

 I still have no idea.

  The bug seems to be on the Cassandra side and the clients seem to just
 be providing patches to these issues.

 It was reported to the Cassandra list, but there was no answer,
 potentially because the query was sent to the wrong list, but I don't
 really know. Maybe it should have gone into Jira, but it's unclear as
 to whether this is a client or a server issue.

 In any case, it didn't look like the server behavior was going to
 change any time soon, so we just took the pragmatic approach in gocql
 and worked around the issue.

  I will post to the Datastax java driver mailing list and see if they
 are willing to add a patch.

 That sounds like a good idea, seeing as the workaround has been tested
 before.

 Sorry to be of little help to you.





Re: DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-18 Thread Karl Rieb
Thanks Ben,

I found that thread, but my concern is the inconsistency on the Cassandra side. 
Why is the protocol ID correct for some tables but not others? Why does it work 
when I do a clean install on a new 2.0.x cluster?  The bug seems to be on the 
Cassandra side and the clients seem to just be providing patches to these 
issues.

I will post to the Datastax java driver mailing list and see if they are 
willing to add a patch. 

-Karl

 On Jul 18, 2014, at 3:59 AM, Ben Hood 0x6e6...@gmail.com wrote:
 
 On Fri, Jul 18, 2014 at 3:38 AM, Karl Rieb karl.r...@gmail.com wrote:
 Any suggestions on what is going on or how to fix it?
 
 I'm not sure how much this will help, but one of the gocql users
 reported similar symptoms when upgrading to 2.0.6. We ended up
 applying a client side patch to address the issue, the details are
 here:
 
 https://github.com/gocql/gocql/pull/154
 
 That pull request also references the original bug report:
 
 https://github.com/gocql/gocql/issues/151
 
 Not sure how helpful this will be though.


DataType protocol ID error for TIMESTAMPs when upgrading from 1.2.11 to 2.0.9

2014-07-17 Thread Karl Rieb
Hi,

I've been testing an in-place upgrade of a 1.2.11 cluster to 2.0.9.  The
1.2.11 nodes all have a schema defined through CQL with existing data
before I perform the rolling upgrade.  While the upgrade is in progress,
services are continuing to read and write data to the cluster (strictly
using protocol version 1).  I drain each node one at a time, upgrade the
configuration files, upgrade cassandra, then start the node back up.  The
cassandra logs show no errors or exceptions during startup and appear to
join properly with the other nodes in the cluster.

On our service side, everything goes smoothly except for queries against a
few of our tables.  On some of the tables with timestamp columns (not all),
we will get an error from the Datastax java-driver when binding
PreparedStatements or trying to process ResultSets:

com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for
value 2 of CQL type 'org.apache.cassandra.db.marshal.DateType', expecting
class java.nio.ByteBuffer but class java.util.Date provided
at com.datastax.driver.core.BoundStatement.bind(BoundStatement.java:190)
at
com.datastax.driver.core.DefaultPreparedStatement.bind(DefaultPreparedStatement.java:103)


I traced the code on the driver side, and I see it has to do with bad
DataType information coming back from a table metadata query.  The 2.0.9
nodes will return protocol ID 0 instead of 11 for some timestamp column
definitions.  The protocol ID 0 maps to a custom type, and the 2.0.9
nodes specify org.apache.cassandra.db.marshal.DateType as the custom type
name.  The 1.2.11 nodes, however, continue to send 11 for their protocol
ID, which gets properly mapped to the timestamp data type.

Strangely not all our tables with timestamp columns have this issue.

If I bring up an entirely new 2.0.9 cluster (no existing data), and
provision our schema, then there are no issues.  The proper protocol ID,
11, gets sent for all our tables with timestamp columns.

I have tried doing nodetool upgradesstables and nodetool scrub on the
nodes, but neither fixes the issue.

Any suggestions on what is going on or how to fix it?