Re: Issue with native protocol

2021-07-29 Thread Erick Ramirez
Then that's the cause for the node negotiating down to an older protocol
version by design for dealing with mixed-version clusters as Sam described
in his response. As Bowen stated, you must have had an old node back from
when it was still a C* 2.2 cluster that you probably tried to
remove/decommission but ran into issues so it's still hanging around in
gossip.

You can manually delete that node to get rid of it with:

cqlsh> DELETE FROM system.peers WHERE peer = '10.39.36.152';


There's a good chance that you need to delete it multiple times -- it's a
race with gossip re-populating the table. Also check that it's completely
gone from nodetool gossipinfo. Once you're convinced that it's no longer in
gossip and in peers, you'll need to restart the node so it defaults back to
v4. Good luck!


Re: Issue with native protocol

2021-07-29 Thread Srinivas Polamarasetty
We don’t see this node as part of our cluster. Not listed in nodetool status as 
well.



Regards,
Srinivas P


From: Erick Ramirez 
Date: Friday, 30 July 2021 at 9:55 AM
To: user@cassandra.apache.org 
Subject: Re: Issue with native protocol
Is 10.39.36.152 part of the cluster or is it dead?


Re: Issue with native protocol

2021-07-29 Thread Erick Ramirez
Is 10.39.36.152 part of the cluster or is it dead?

>


Re: Issue with native protocol

2021-07-29 Thread Srinivas Polamarasetty
Hi,

I am getting following output on problematic node where as no output on working 
node.


cassandra@cqlsh> select peer, host_id, release_version from system.peers where 
release_version < '3.0.0' allow filtering;



 peer | host_id  | release_version

--+--+-

 10.39.36.152 | b2d1191e-fdd0-492d-b31f-083fb7109909 |   2.2.5






Regards,
Srinivas P


From: Bowen Song 
Date: Thursday, 29 July 2021 at 2:20 PM
To: user@cassandra.apache.org 
Subject: Re: Issue with native protocol

Can you please run the following query on the problematic node?

select peer, host_id, release_version from system.peers where release_version < 
'3.0.0' allow filtering;
I suspect you have a "ghost" 2.x node in the system.peers table on this 
problematic node.

The "ghost" node will not show up in the "nodetool status" output, because it 
does not exist except in the system.peers table.

On 29/07/2021 05:22, Srinivas Polamarasetty wrote:
We have tried restarting couple of times and issue remains same so repointed to 
another node and able to communicate with db. We think some issue with this 
particular node only

Regards,
Srinivas

Get Outlook for 
Android

From: manish khandelwal 

Sent: Thursday, July 29, 2021 8:31:17 AM
To: user@cassandra.apache.org 

Subject: Re: Issue with native protocol

Have you tried restarting your application? It should renegotiate the protoco 
during handshake and should resolve the issue.

On Wed, Jul 28, 2021 at 3:06 PM Srinivas Polamarasetty 
mailto:srinivas.polamarase...@logmein.com>> 
wrote:

Query shows 4 but App team also getting below error. They were not able to 
communicate with this node.



com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for 
requested operation: ['org.apache.cassandra.db.marshal.ShortType' <-> 
java.lang.Short]





[2021-07-09 23:26:52.382 -0700]  
com.datastax.driver.core.Connection - DEBUG: Got unsupported protocol version 
error from /: for version V4 server supports version V3

[2021-07-09 23:26:52.382 -0700]  
com.datastax.driver.core.Connection - DEBUG: Connection[//: -1, 
inFlight=0, closed=true] closing connection

[2021-07-09 23:26:52.382 -0700]  
com.datastax.driver.core.Host.STATES - DEBUG: [//:] 
Connection[/10.39.38.166:9042-1, inFlight=0, closed=true] closed, remaining = 0

[2021-07-09 23:26:52.383 -0700]  com.datastax.driver.core.Cluster - 
DEBUG: Cannot connect with protocol V4, trying V3









Regards,

Srinivas P





From: Erick Ramirez 
mailto:erick.rami...@datastax.com>>
Date: Wednesday, 28 July 2021 at 12:39 PM
To: user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
Subject: Re: Issue with native protocol

Someone asked me about the same issue a couple of months ago and we never 
managed to figure out why the wrong version is being displayed.



Could you try to run `SELECT native_protocol_version FROM system.local`? It 
should come back with 4. Cheers!


Re: [RELEASE] Apache Cassandra 4.0.0 released

2021-07-29 Thread Adam Scott
Thank you Scott!

We installed over on top of 3.11 and it's working fine.  Easy non-upgrade path 
:)

Thanks again,

Adam


On 2021/07/28 22:49:11, Scott Andreas  wrote: 
> If you're running Cassandra 3.x, the only data file requirement is that all 
> SSTables present on your cluster are 3.x-era SSTables.
> 
> This means that you should/should have run upgradesstables at least once on 
> Cassandra 3.x after upgrading from an earlier release before upgrading to 
> Cassandra 4.0. This is necessary because Cassandra 4.0 cannot read the legacy 
> Cassandra 2.x SSTable format.
> 
> In the same vein, it's not necessary to run upgradesstables after upgrading 
> from Cassandra 3.x to Cassandra 4.0 as 4.0 is able to read 3.x's data format 
> fine. At a later time in the future, it may be necessary to run 
> upgradesstables prior to upgrading to confirm there are no 3.x SSTables left 
> -- but no such post-4.0 SSTable format changes have been proposed yet.
> 
> 
> From: Adam Scott 
> Sent: Wednesday, July 28, 2021 2:58 PM
> To: user@cassandra.apache.org
> Subject: Re: [RELEASE] Apache Cassandra 4.0.0 released
> 
> Thanks Brandon!
> 
> Anyone know the upgrade path from 3.x?  
> https://cassandra.apache.org/doc/latest/cassandra/getting_started/installing.html
>  Doesn't look like it has specific upgrade instructions.
> 
> For instance do we need to run nodetool upgradesstables?
> 
> TIA
> 
> 
> On 2021/07/26 20:03:59, Brandon Williams  wrote:
> > The Cassandra team is pleased to announce the release of Apache
> > Cassandra version 4.0.0.
> >
> > Apache Cassandra is a fully distributed database. It is the right
> > choice when you need scalability and high availability without
> > compromising performance.
> >
> > http://cassandra.apache.org/
> >
> > Downloads of source and binary distributions are available in our
> > download section:
> >
> > http://cassandra.apache.org/download/
> >
> > This version is the initial release in the 4.0 series. As always,
> > please pay attention to the release notes[2] and Let us know[3] if you
> > were to encounter any problem.
> >
> > Enjoy!
> >
> > [1]: CHANGES.txt
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=CHANGES.txt;hb=refs/tags/cassandra-4.0.0
> > [2]: NEWS.txt 
> > https://gitbox.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=NEWS.txt;hb=refs/tags/cassandra-4.0.0
> > [3]: https://issues.apache.org/jira/browse/CASSANDRA
> >
> 


Re: High memory usage during nodetool repair

2021-07-29 Thread Amandeep Srivastava
Hi Erick,

Limiting mmap to index only seems to have resolved the issue. The max ram
usage remained at 60% this time. Could you please point me to the
limitations for setting this param? - For starters, I can see read
performance getting reduced up to 30% (CASSANDRA-8464
)

Also if you could please shed light on extended questions in my earlier
email.

Thanks a lot.

Regards,
Aman

On Thu, Jul 29, 2021 at 12:52 PM Amandeep Srivastava <
amandeep.srivastava1...@gmail.com> wrote:

> Thanks, Bowen, don't think that's an issue - but yes I can try upgrading
> to 3.11.5 and limit the merkle tree size to bring down the memory
> utilization.
>
> Thanks, Erick, let me try that.
>
> Can someone please share documentation relating to internal functioning of
> full repairs - if there exists one? Wanted to understand the role of the
> heap and off-heap memory separately during the process.
>
> Also, for my case, once the nodes reach the 95% memory usage, it stays
> there for almost 10-12 hours after the repair is complete, before falling
> back to 65%. Any pointers on what might be consuming off-heap for so long
> and can something be done to clear it earlier?
>
> Thanks,
> Aman
>
>
>

-- 
Regards,
Aman


Re: Issue with native protocol

2021-07-29 Thread Bowen Song

Can you please run the following query on the problematic node?

   select peer, host_id, release_version from system.peers where
   release_version < '3.0.0' allow filtering;

I suspect you have a "ghost" 2.x node in the system.peers table on this 
problematic node.


The "ghost" node will not show up in the "nodetool status" output, 
because it does not exist except in the system.peers table.


On 29/07/2021 05:22, Srinivas Polamarasetty wrote:
We have tried restarting couple of times and issue remains same so 
repointed to another node and able to communicate with db. We think 
some issue with this particular node only


Regards,
Srinivas

Get Outlook for Android 

*From:* manish khandelwal 
*Sent:* Thursday, July 29, 2021 8:31:17 AM
*To:* user@cassandra.apache.org 
*Subject:* Re: Issue with native protocol
Have you tried restarting your application? It should renegotiate the 
protoco during handshake and should resolve the issue.


On Wed, Jul 28, 2021 at 3:06 PM Srinivas Polamarasetty 
> wrote:


Query shows 4 but App team also getting below error. They were not
able to communicate with this node.

com.datastax.driver.core.exceptions.CodecNotFoundException: Codec
not found for requested operation:
['org.apache.cassandra.db.marshal.ShortType' <-> java.lang.Short]

[2021-07-09 23:26:52.382 -0700] 
com.datastax.driver.core.Connection - DEBUG: Got unsupported
protocol version error from /: for version V4 server
supports version V3

[2021-07-09 23:26:52.382 -0700] 
com.datastax.driver.core.Connection - DEBUG:
Connection[//: -1, inFlight=0, closed=true] closing
connection

[2021-07-09 23:26:52.382 -0700] 
com.datastax.driver.core.Host.STATES - DEBUG: [//:]
Connection[/10.39.38.166:9042-1, inFlight=0, closed=true] closed,
remaining = 0

[2021-07-09 23:26:52.383 -0700] 
com.datastax.driver.core.Cluster - DEBUG: Cannot connect with
protocol V4, trying V3

Regards,

Srinivas P

*From: *Erick Ramirez mailto:erick.rami...@datastax.com>>
*Date: *Wednesday, 28 July 2021 at 12:39 PM
*To: *user@cassandra.apache.org 
mailto:user@cassandra.apache.org>>
*Subject: *Re: Issue with native protocol

Someone asked me about the same issue a couple of months ago and
we never managed to figure out why the wrong version is being
displayed.

Could you try to run `SELECT native_protocol_version FROM
system.local`? It should come back with 4. Cheers!



Re: Issue with native protocol

2021-07-29 Thread Sam Tunnicliffe
Assuming that the one node doesn't have 
native_transport_max_negotiable_protocol_version=3 in cassandra.yaml, you could 
check its log for 
"Detected peers which do not fully support protocol V4. Capping max negotiable 
version to V3". 

The details are in CASSANDRA-15193, but tl;dr is that a serialisation bug 
affecting paging in mixed version clusters means that it was/is not ideal to 
support V4 in a cluster containing both 2.x and 3.x nodes. Each 3.x node 
determines the max protocol version it should support based on the advertised 
versions of its peers. It's possible that the affected node missed an update 
regarding one of its peers and so is incorrectly enforcing the cap. If that is 
the case then restarting that node should prompt it to reevaluate the cap.



> On 29 Jul 2021, at 07:54, Erick Ramirez  wrote:
> 
> Thanks, Pekka. But we know from an earlier post from Srinivas that the driver 
> is trying to negotiate with v4 but the node wouldn't:
> 
> [2021-07-09 23:26:52.382 -0700]  
> com.datastax.driver.core.Connection - DEBUG: Got unsupported protocol version 
> error from /: for version V4 server supports version V3
> [2021-07-09 23:26:52.382 -0700]  
> com.datastax.driver.core.Connection - DEBUG: Connection[//: -1, 
> inFlight=0, closed=true] closing connection
> [2021-07-09 23:26:52.382 -0700]  
> com.datastax.driver.core.Host.STATES - DEBUG: [//:] 
> Connection[/10.39.38.166:9042-1, inFlight=0, closed=true] closed, remaining = > 0
> [2021-07-09 23:26:52.383 -0700]  com.datastax.driver.core.Cluster - 
> DEBUG: Cannot connect with protocol V4, trying V3
> 
> So we know it's just the one problematic node in the cluster which won't 
> negotiate. The SHOW VERSION in cqlsh also indicates v3 but I can't figure out 
> what could be triggering it. Cheers!



Re: High memory usage during nodetool repair

2021-07-29 Thread Amandeep Srivastava
Thanks, Bowen, don't think that's an issue - but yes I can try upgrading to
3.11.5 and limit the merkle tree size to bring down the memory utilization.

Thanks, Erick, let me try that.

Can someone please share documentation relating to internal functioning of
full repairs - if there exists one? Wanted to understand the role of the
heap and off-heap memory separately during the process.

Also, for my case, once the nodes reach the 95% memory usage, it stays
there for almost 10-12 hours after the repair is complete, before falling
back to 65%. Any pointers on what might be consuming off-heap for so long
and can something be done to clear it earlier?

Thanks,
Aman


Re: Issue with native protocol

2021-07-29 Thread Erick Ramirez
Thanks, Pekka. But we know from an earlier post from Srinivas that the
driver is trying to negotiate with v4 but the node wouldn't:

[2021-07-09 23:26:52.382 -0700] 
com.datastax.driver.core.Connection - DEBUG: Got unsupported protocol
version error from /: for version V4 server supports version V3
[2021-07-09 23:26:52.382 -0700] 
com.datastax.driver.core.Connection - DEBUG: Connection[//: -1,
inFlight=0, closed=true] closing connection
[2021-07-09 23:26:52.382 -0700] 
com.datastax.driver.core.Host.STATES - DEBUG: [//:]
Connection[/10.39.38.166:9042-1, inFlight=0, closed=true] closed, remaining
= 0
[2021-07-09 23:26:52.383 -0700]  com.datastax.driver.core.Cluster -
DEBUG: Cannot connect with protocol V4, trying V3

So we know it's just the one problematic node in the cluster which won't
negotiate. The SHOW VERSION in cqlsh also indicates v3 but I can't figure
out what could be triggering it. Cheers!


Re: Issue with native protocol

2021-07-29 Thread Pekka Enberg
Hi,

On Thu, Jul 29, 2021 at 9:44 AM Erick Ramirez
 wrote:
> When you restart C*, you should have an entry in the logs which look like 
> this that indicates it defaults to v4:
>
> INFO  [main] 2021-07-28 20:45:31,178 StorageService.java:650 - Native 
> protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)
>
> I'm hoping someone else here on the mailing list can give pointers as to why 
> a 3.11 node would advertise v3. I've been code-diving and scratching my head. 
> I can't think of a scenario that would lead to this:
>
> [cqlsh 5.0.1 | Cassandra 3.11.5 | CQL spec 3.4.4 | Native protocol v3]

If possible, one way to debug this is to use Wireshark's CQL support
to trace the CQL server/client communication. You can see what CQL
binary protocol version the client asks for (in the header), and also
any possible errors/fallback from the server. This should allow you to
narrow down if this is a client or server issue.

Hope this helps!

Regards,

- Pekka


Re: Issue with native protocol

2021-07-29 Thread Erick Ramirez
When you restart C*, you should have an entry in the logs which look like
this that indicates it defaults to v4:

INFO  [main] 2021-07-28 20:45:31,178 StorageService.java:650 - Native
protocol supported versions: 3/v3, 4/v4, 5/v5-beta (default: 4/v4)

I'm hoping someone else here on the mailing list can give pointers as to
why a 3.11 node would advertise v3. I've been code-diving and scratching my
head. I can't think of a scenario that would lead to this:

[cqlsh 5.0.1 | Cassandra 3.11.5 | CQL spec 3.4.4 | Native protocol v3]