Re: Cassandra 4.0.6 token mismatch issue in production environment

Jaydeep Chovatia Mon, 23 Oct 2023 16:27:49 -0700

Sounds good. Thanks a lot for all your help!

Jaydeep


On Mon, Oct 23, 2023 at 3:30 PM Jeff Jirsa <jji...@gmail.com> wrote:

> Not aware of any that survive node restart, though in the past, there were
> races around starting an expansion while one node was partitioned/down (and
> missing the initial gossip / UP). A heap dump could have told us a bit more
> conclusively, but it's hard to guess for now.
>
>
>
> On Mon, Oct 23, 2023 at 3:22 PM Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> wrote:
>
>> The issue was persisting on a few nodes despite no changes to the
>> topology. Even node restarting did not help. Only after we evacuated those
>> nodes, the issue got resolved.
>>
>> Do you think of a possible situation under which this could happen?
>>
>> Jaydeep
>>
>> On Sat, Oct 21, 2023 at 10:25 AM Jaydeep Chovatia <
>> chovatia.jayd...@gmail.com> wrote:
>>
>>> Thanks, Jeff!
>>> I will keep this thread updated on our findings.
>>>
>>> Jaydeep
>>>
>>> On Sat, Oct 21, 2023 at 9:37 AM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> That code path was added to protect against invalid gossip states
>>>>
>>>> For this logger to be issued, the coordinator receiving the query must
>>>> identify a set of replicas holding the data to serve the read, and one of
>>>> the selected replicas must disagree that it’s a replica based on its view
>>>> of the token ring
>>>>
>>>> This probably means that at least one node in your cluster has an
>>>> invalid view of the ring - if you issue a “nodetool ring” from every host
>>>> and compare them, you’ll probably notice one or more is wrong
>>>>
>>>> It’s also possible this happens for a few seconds during adding /
>>>> moving / removing hosts
>>>>
>>>> If you weren’t changing the topology of the cluster, it’s  likely the
>>>> case that bouncing the cluster fixes it
>>>>
>>>> (Im unsure of the defaults and not able to look it up, but cassandra
>>>> can log or log and drop the read - you probably want to drop the read log,
>>>> which is the right solution so it doesn’t accidentally return a missing /
>>>> empty result set as a valid query result, instead it’ll force it to read
>>>> from other replicas or time out)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Oct 20, 2023, at 10:57 PM, Jaydeep Chovatia <
>>>> chovatia.jayd...@gmail.com> wrote:
>>>>
>>>> 
>>>>
>>>> Hi,
>>>>
>>>> I am using Cassandra 4.0.6 in production, and receiving the following 
>>>> error. This indicates that Cassandra nodes have mismatch in token-owership.
>>>>
>>>> Has anyone seen this issue before?
>>>>
>>>> Received a read request from /XX.XX.XXX.XXX:YYYYY for a range that is not 
>>>> owned by the current replica Read(keyspace.table columns=*/[c1] rowFilter= 
>>>> limits=LIMIT 100 key=7BE78B90-AD66-406B-AA05-6A062F72F542:0 
>>>> filter=slice(slices=ALL, reversed=false), nowInSec=1697751757).
>>>>
>>>> Jaydeep
>>>>
>>>>

Re: Cassandra 4.0.6 token mismatch issue in production environment

Reply via email to