RE: Re: Unable to gossip with peers when starting cluster

2022-11-11 Thread Ben Klein

0.9 was never a seed before.

Based on your comment, I also tried, from having all three nodes up 
(following the initial bootstrap), restarting 0.7. This failed with the 
same error.



On 2022/11/09 15:37:24 Jeff Jirsa wrote:
> When you say you configured them to talk to .0.31 as a seed, did you do
> that by changing the yaml?
>
> Was 0.9 ever a seed before?
>
> I expect if you start 0.7 and 0.9 at the same time, it all works. This
> looks like a logic/state bug that needs to be fixed, though.
>
> (If you're going to upgrade, usually you start with all 3 hosts up, and
> restart one at a time. Starting with 0 online is likely poorly 
tested, and

> we should fix that).
>
>
>
> On Wed, Nov 9, 2022 at 7:08 AM Klein, Benjamin E (PERATON) <
> benjamin.e.kl...@peraton.com> wrote:
>
> > I am trying to upgrade a three-node Cassandra cluster (192.168.0.31,
> > 192.168.0.7, and 192.168.0.9) from 3.11 to 4.0.3. At the start of the
> > process, all three nodes are down. I have configured all three nodes to
> > have 192.168.0.31:7000 as their only seed.
> >
> > I am trying to bring all three nodes up, one at a time. Starting Node 1
> > (.31) works just fine. However, Node 2 (.7) fails to start with the 
error
> > message "Unable to gossip with any peers". The configuration file 
and log

> > from Node 2 are attached (the log has had lines related to loading
> > individual tables snipped); the relevant portion of the log is at the
> > bottom of this message. Note that this node was able to successfully
> > connect to the other seed node.
> >
> > I have already tried the following unsuccessfully:
> >
> > * Starting with a completely blank (i.e., newly formatted) /data 
drive on
> > all nodes. This worked fine the first time the cluster started; 
however,

> > attempting to restart the cluster gives the same error.
> > * Ensuring that all clocks are synchronized to the same NTP 
servers, which

> > have a ping time to all three nodes of approximately 0.5-1.0ms
> > * Setting the cross_node_timeout configuration entry to false
> > * Setting the internode_tcp_connect_timeout_in_ms configuration 
entry to

> > 2
> > * Adding an entry for each node in its /etc/hosts file (e.g., Node 
1 gets

> > the entry "192.168.0.31 node-1")
> >
> > Is there anything else I should try?
> >
> > ---
> > Relevant portion of Cassandra log:
> > INFO [main] 2022-11-04 16:57:02,541 StorageService.java:755 - Loading
> > persisted ring state
> > INFO [main] 2022-11-04 16:57:02,541 StorageService.java:838 - 
Populating

> > token metadata from system tables
> > INFO [GossipStage:1] 2022-11-04 16:57:02,570 Gossiper.java:1969 - 
Adding /

> > 192.168.0.31:7000 as there was no previous epState; new state is
> > EndpointState: HeartBeatState = HeartBeat: generation = 0, version 
= -1,

> > AppStateMap = {}
> > INFO [GossipStage:1] 2022-11-04 16:57:02,570 Gossiper.java:1969 - 
Adding /

> > 192.168.0.9:7000 as there was no previous epState; new state is
> > EndpointState: HeartBeatState = HeartBeat: generation = 0, version 
= -1,

> > AppStateMap = {}
> > INFO [main] 2022-11-04 16:57:02,705 
InboundConnectionInitiator.java:127 -

> > Listening on address: (/192.168.0.7:7000), nic: eth0, encryption:
> > unencrypted
> > INFO [Messaging-EventLoop-3-3] 2022-11-04 16:57:02,993
> > OutboundConnection.java:1150 - /192.168.0.7:7000(/192.168.0.7:55882
> > )->/192.168.0.31:7000-URGENT_MESSAGES-ef0bde62 successfully connected,
> > version = 12, framing = CRC, encryption = unencrypted
> > INFO [Messaging-EventLoop-3-6] 2022-11-04 16:57:07,938
> > NoSpamLogger.java:92 - 
/192.168.0.7:7000->/192.168.0.9:7000-URGENT_MESSAGES-[no-channel]

> > failed to connect
> > io.netty.channel.AbstractChannel$AnnotatedConnectException:
> > finishConnect(..) failed: Connection refused: /192.168.0.9:7000
> > Caused by: java.net.ConnectException: finishConnect(..) failed: 
Connection

> > refused
> > at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> > at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
> > at
> > 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)

> > at
> > 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)

> > at
> > 
io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)

> > at
> > 
io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)

> > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
> > at
> > 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)

> > at
> > 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)

> > at
> > 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

> > at java.base/java.lang.Thread.run(Thread.java:829)
> > Exception (java.lang.RuntimeException) encountered during startup: 
Unable

> 

Re: Unable to gossip with peers when starting cluster

2022-11-09 Thread Jeff Jirsa
When you say you configured them to talk to .0.31 as a seed, did you do
that by changing the yaml?

Was 0.9 ever a seed before?

I expect if you start 0.7 and 0.9 at the same time, it all works. This
looks like a logic/state bug that needs to be fixed, though.

(If you're going to upgrade, usually you start with all 3 hosts up, and
restart one at a time. Starting with 0 online is likely poorly tested, and
we should fix that).



On Wed, Nov 9, 2022 at 7:08 AM Klein, Benjamin E (PERATON) <
benjamin.e.kl...@peraton.com> wrote:

> I am trying to upgrade a three-node Cassandra cluster (192.168.0.31,
> 192.168.0.7, and 192.168.0.9) from 3.11 to 4.0.3. At the start of the
> process, all three nodes are down. I have configured all three nodes to
> have 192.168.0.31:7000 as their only seed.
>
> I am trying to bring all three nodes up, one at a time. Starting Node 1
> (.31) works just fine. However, Node 2 (.7) fails to start with the error
> message "Unable to gossip with any peers". The configuration file and log
> from Node 2 are attached (the log has had lines related to loading
> individual tables snipped); the relevant portion of the log is at the
> bottom of this message. Note that this node was able to successfully
> connect to the other seed node.
>
> I have already tried the following unsuccessfully:
>
> * Starting with a completely blank (i.e., newly formatted) /data drive on
> all nodes. This worked fine the first time the cluster started; however,
> attempting to restart the cluster gives the same error.
> * Ensuring that all clocks are synchronized to the same NTP servers, which
> have a ping time to all three nodes of approximately 0.5-1.0ms
> * Setting the cross_node_timeout configuration entry to false
> * Setting the internode_tcp_connect_timeout_in_ms configuration entry to
> 2
> * Adding an entry for each node in its /etc/hosts file (e.g., Node 1 gets
> the entry "192.168.0.31 node-1")
>
> Is there anything else I should try?
>
> ---
> Relevant portion of Cassandra log:
> INFO  [main] 2022-11-04 16:57:02,541 StorageService.java:755 - Loading
> persisted ring state
> INFO  [main] 2022-11-04 16:57:02,541 StorageService.java:838 - Populating
> token metadata from system tables
> INFO  [GossipStage:1] 2022-11-04 16:57:02,570 Gossiper.java:1969 - Adding /
> 192.168.0.31:7000 as there was no previous epState; new state is
> EndpointState: HeartBeatState = HeartBeat: generation = 0, version = -1,
> AppStateMap = {}
> INFO  [GossipStage:1] 2022-11-04 16:57:02,570 Gossiper.java:1969 - Adding /
> 192.168.0.9:7000 as there was no previous epState; new state is
> EndpointState: HeartBeatState = HeartBeat: generation = 0, version = -1,
> AppStateMap = {}
> INFO  [main] 2022-11-04 16:57:02,705 InboundConnectionInitiator.java:127 -
> Listening on address: (/192.168.0.7:7000), nic: eth0, encryption:
> unencrypted
> INFO  [Messaging-EventLoop-3-3] 2022-11-04 16:57:02,993
> OutboundConnection.java:1150 - /192.168.0.7:7000(/192.168.0.7:55882
> )->/192.168.0.31:7000-URGENT_MESSAGES-ef0bde62 successfully connected,
> version = 12, framing = CRC, encryption = unencrypted
> INFO  [Messaging-EventLoop-3-6] 2022-11-04 16:57:07,938
> NoSpamLogger.java:92 - 
> /192.168.0.7:7000->/192.168.0.9:7000-URGENT_MESSAGES-[no-channel]
> failed to connect
> io.netty.channel.AbstractChannel$AnnotatedConnectException:
> finishConnect(..) failed: Connection refused: /192.168.0.9:7000
> Caused by: java.net.ConnectException: finishConnect(..) failed: Connection
> refused
> at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
> at io.netty.channel.unix.Socket.finishConnect(Socket.java:251)
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673)
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650)
> at
> io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530)
> at
> io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470)
> at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
> at
> io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
> at
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Exception (java.lang.RuntimeException) encountered during startup: Unable
> to gossip with any peers
> java.lang.RuntimeException: Unable to gossip with any peers
> at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1844)
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:650)
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:936)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:786)
> 

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger
Oh!  Excellent!  Doh!  That was it.  So when we add a new system, we use 
puppet to push things out...like NTP...well this is our first Rocky 
Linux install and guess what I didn't do?

Thank you Song.  The new machine is now joining the cluster.

nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  Host 
ID   Rack
UN  172.16.100.251  490.67 GiB  200 38.7% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.208  76.31 GiB   30  5.8% 
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.252  504.13 GiB  200 38.6% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  519.29 GiB  200 38.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   526.47 GiB  200 38.6% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   523.19 GiB  200 38.6% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.42 GiB   4   0.8% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  526.61 GiB  200 38.7% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
*UJ  172.16.100.44   179.98 KiB  200 ? 
b2e5366e-8386-40ec-a641-27944a5a7cfa  rack1*
UN  172.16.100.37   315.89 GiB  120 23.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  465.48 GiB  200 38.6% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


Cheers!

-Joe

On 9/10/2021 1:25 PM, Bowen Song wrote:


Hello Joe,


These logs indicate the clocks are out of sync (by over 4.2 hours) 
between the new node and the seed nodes:


INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were
dropped in last 5000 ms: 0 internal and 1 cross node. Mean
internal dropped latency: 0 ms and Mean cross-node dropped
latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:14:36,594
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were
dropped in last 5000 ms: 0 internal and 1 cross node. Mean
internal dropped latency: 0 ms and Mean cross-node dropped
latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:18:42,653
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were
dropped in last 5000 ms: 0 internal and 1 cross node. Mean
internal dropped latency: 0 ms and Mean cross-node dropped
latency: 15137813 ms

Can you please check that the NTP client is running on all servers and 
the clocks are in sync?



Cheers,

Bowen



On 10/09/2021 16:18, Joe Obernberger wrote:


Good idea.
There are two seed nodes:
I see this on one (note 172.16.100.44 is the new node):

DEBUG [CompactionExecutor:1345] 2021-09-10 11:13:49,569 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully 
expired SSTables
INFO  [Messaging-EventLoop-3-10] 2021-09-10 11:14:22,810 
InboundConnectionInitiator.java:464 - 
/172.16.100.44:7000(/172.16.100.44:45970)->/172.16.100.253:7000-URGENT_MESSAGES-30a4fd82 
messaging connection established, version = 12, framing = LZ4, 
encryption = unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped 
in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:65 
- Pool Name Active   Pending  Completed   Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- ReadStage 0 0    4729810 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- CompactionExecutor 0 0 384171 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- MutationStage 0 0   14540487 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- MemtableReclaimMemory 0 0    316 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- PendingRangeCalculator 0 0 11 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- GossipStage 0 0    1126031 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- SecondaryIndexManagement 0 0  0 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- HintsDispatcher 0 0 15 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- Native-Transport-Requests 0 0   13286230 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- RequestResponseStage 0 0   15724485 
0 0

Re: Unable to Gossip

2021-09-10 Thread Bowen Song

Hello Joe,


These logs indicate the clocks are out of sync (by over 4.2 hours) 
between the new node and the seed nodes:


   INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567
   MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped
   in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped
   latency: 0 ms and Mean cross-node dropped latency: 15137813 ms
   INFO  [ScheduledTasks:1] 2021-09-10 11:14:36,594
   MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped
   in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped
   latency: 0 ms and Mean cross-node dropped latency: 15137813 ms
   INFO  [ScheduledTasks:1] 2021-09-10 11:18:42,653
   MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped
   in last 5000 ms: 0 internal and 1 cross node. Mean internal dropped
   latency: 0 ms and Mean cross-node dropped latency: 15137813 ms

Can you please check that the NTP client is running on all servers and 
the clocks are in sync?



Cheers,

Bowen



On 10/09/2021 16:18, Joe Obernberger wrote:


Good idea.
There are two seed nodes:
I see this on one (note 172.16.100.44 is the new node):

DEBUG [CompactionExecutor:1345] 2021-09-10 11:13:49,569 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully 
expired SSTables
INFO  [Messaging-EventLoop-3-10] 2021-09-10 11:14:22,810 
InboundConnectionInitiator.java:464 - 
/172.16.100.44:7000(/172.16.100.44:45970)->/172.16.100.253:7000-URGENT_MESSAGES-30a4fd82 
messaging connection established, version = 12, framing = LZ4, 
encryption = unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped in 
last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:65 
- Pool Name   Active Pending  Completed   
Blocked  All Time Blocked
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- ReadStage 0 0    4729810 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- CompactionExecutor 0 0 384171 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 
- MutationStage 0 0   14540487 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- MemtableReclaimMemory 0 0    316 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- PendingRangeCalculator 0 0 11 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- GossipStage 0 0    1126031 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- SecondaryIndexManagement 0 0  0 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- HintsDispatcher 0 0 15 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- Native-Transport-Requests 0 0   13286230 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- RequestResponseStage 0 0   15724485 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- MemtableFlushWriter 0 0    298 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- PerDiskMemtableFlushWriter_0 0 0    316 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 
- MemtablePostFlush 0 0    336 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 
- Sampler 0 0  0 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 
- ValidationExecutor 0 0  0 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 
- ViewBuildExecutor 0 0  0 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 
- CacheCleanupExecutor 0 0  0 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:79 
- CompactionManager 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:91 
- MessagingService    n/a 0/0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:101 
- Cache Type Size Capacity   KeysToSave
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:103 
- KeyCache 75539240    104857600 

Re: Unable to Gossip

2021-09-10 Thread vytenis silgalis
Hmm. are the ports open on the `new` server?

Looks like it can connect to other nodes but other nodes can't connect to
it.

-Vy

On Fri, Sep 10, 2021 at 10:20 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Good idea.
> There are two seed nodes:
> I see this on one (note 172.16.100.44 is the new node):
>
> DEBUG [CompactionExecutor:1345] 2021-09-10 11:13:49,569
> TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully
> expired SSTables
> INFO  [Messaging-EventLoop-3-10] 2021-09-10 11:14:22,810
> InboundConnectionInitiator.java:464 - /172.16.100.44:7000
> (/172.16.100.44:45970)->/172.16.100.253:7000-URGENT_MESSAGES-30a4fd82
> messaging connection established, version = 12, framing = LZ4, encryption =
> unencrypted
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 MessagingMetrics.java:206
> - GOSSIP_DIGEST_SYN messages were dropped in last 5000 ms: 0 internal and 1
> cross node. Mean internal dropped latency: 0 ms and Mean cross-node dropped
> latency: 15137813 ms
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:65 -
> Pool Name   Active   Pending  Completed   Blocked
> All Time Blocked
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 -
> ReadStage0 04729810
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 -
> CompactionExecutor   0 0 384171
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 -
> MutationStage0 0   14540487
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> MemtableReclaimMemory0 0316
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> PendingRangeCalculator   0 0 11
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> GossipStage  0 01126031
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> SecondaryIndexManagement 0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> HintsDispatcher  0 0 15
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> Native-Transport-Requests0 0   13286230
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> RequestResponseStage 0 0   15724485
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> MemtableFlushWriter  0 0298
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> PerDiskMemtableFlushWriter_0 0 0316
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 -
> MemtablePostFlush0 0336
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> Sampler  0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> ValidationExecutor   0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> ViewBuildExecutor0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 -
> CacheCleanupExecutor 0 0  0
> 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:79 -
> CompactionManager 0 0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:91 -
> MessagingServicen/a   0/0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:101 -
> Cache Type Size Capacity
> KeysToSave
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:103 -
> KeyCache   75539240
> 104857600  all
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:109 -
> RowCache  0
> 0  all
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:116 -
> Table   Memtable ops,data
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 -
> system_schema.columns 0,0
> INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 -
> system_schema.types   0,0
> INFO  

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger

Good idea.
There are two seed nodes:
I see this on one (note 172.16.100.44 is the new node):

DEBUG [CompactionExecutor:1345] 2021-09-10 11:13:49,569 
TimeWindowCompactionStrategy.java:129 - TWCS skipping check for fully 
expired SSTables
INFO  [Messaging-EventLoop-3-10] 2021-09-10 11:14:22,810 
InboundConnectionInitiator.java:464 - 
/172.16.100.44:7000(/172.16.100.44:45970)->/172.16.100.253:7000-URGENT_MESSAGES-30a4fd82 
messaging connection established, version = 12, framing = LZ4, 
encryption = unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 
MessagingMetrics.java:206 - GOSSIP_DIGEST_SYN messages were dropped in 
last 5000 ms: 0 internal and 1 cross node. Mean internal dropped 
latency: 0 ms and Mean cross-node dropped latency: 15137813 ms
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:65 - 
Pool Name   Active Pending  Completed   Blocked  
All Time Blocked
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 - 
ReadStage 0 0    4729810 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 - 
CompactionExecutor 0 0 384171 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,567 StatusLogger.java:69 - 
MutationStage 0 0   14540487 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
MemtableReclaimMemory 0 0    316 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
PendingRangeCalculator 0 0 11 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
GossipStage 0 0    1126031 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
SecondaryIndexManagement 0 0  0 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
HintsDispatcher 0 0 15 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
Native-Transport-Requests 0 0   13286230 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
RequestResponseStage 0 0   15724485 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
MemtableFlushWriter 0 0    298 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
PerDiskMemtableFlushWriter_0 0 0    316 
0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,568 StatusLogger.java:69 - 
MemtablePostFlush 0 0    336 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 - 
Sampler 0 0  0 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 - 
ValidationExecutor 0 0  0 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 - 
ViewBuildExecutor 0 0  0 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:69 - 
CacheCleanupExecutor 0 0  0 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,569 StatusLogger.java:79 - 
CompactionManager 0 0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:91 - 
MessagingService    n/a 0/0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:101 - 
Cache Type Size Capacity   KeysToSave
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:103 - 
KeyCache 75539240    104857600  all
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:109 - 
RowCache 0    0  all
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:116 - 
Table   Memtable ops,data
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 - 
system_schema.columns 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 - 
system_schema.types 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 - 
system_schema.indexes 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 - 
system_schema.keyspaces 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,570 StatusLogger.java:119 - 
system_schema.dropped_columns 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,571 StatusLogger.java:119 - 
system_schema.aggregates 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,571 StatusLogger.java:119 - 
system_schema.triggers 0,0
INFO  [ScheduledTasks:1] 2021-09-10 11:14:26,571 StatusLogger.java:119 - 

Re: Unable to Gossip

2021-09-10 Thread Jeff Jirsa
Can you drop as much info as possible into a JIRA?

Include the output of `nodetool gossipinfo` if at all possible



On Fri, Sep 10, 2021 at 7:58 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Thank you Jeff - yes, this is on the latest 4.0.1
>
> nodetool version
> ReleaseVersion: 4.0.1
> nodetool status
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address LoadTokens  Owns (effective)  Host
> ID   Rack
> UN  172.16.100.251  488.38 GiB  200 38.7%
> 660f476c-a124-4ca0-b55f-75efe56370da  rack1
> UN  172.16.100.208  76.02 GiB   30  5.8%
> 2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
> UN  172.16.100.252  501.88 GiB  200 38.6%
> e83aa851-69b4-478f-88f6-60e657ea6539  rack1
> UN  172.16.100.249  517.27 GiB  200 38.6%
> 49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
> UN  172.16.100.36   524.45 GiB  200 38.6%
> d9702f96-256e-45ae-8e12-69a42712be50  rack1
> UN  172.16.100.39   521.05 GiB  200 38.6%
> 93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
> UN  172.16.100.253  11.39 GiB   4   0.8%
> a1a16910-9167-4174-b34b-eb859d36347e  rack1
> UN  172.16.100.248  524.46 GiB  200 38.7%
> 4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
> UN  172.16.100.37   314.67 GiB  120 23.2%
> 08a19658-40be-4e55-8709-812b3d4ac750  rack1
> UN  172.16.100.250  464.23 GiB  200 38.6%
> b74b6e65-af63-486a-b07f-9e304ec30a39  rack1
>
> yum list installed | grep cass
> cassandra.noarch 4.0.1-1
> @cassandra
>
> -Joe
> On 9/10/2021 10:54 AM, Jeff Jirsa wrote:
>
> Is this on 4.0.0 ? 4.0.1 fixes an issue where the gossip result is too
> large for the urgent message queue, causing this stack trace, and was
> released 3 days ago. I've never seen it on a 10 node cluster before, but
> I'd be trying that.
>
> On Fri, Sep 10, 2021 at 7:50 AM Joe Obernberger <
> joseph.obernber...@gmail.com> wrote:
>
>> I have a 10 node cluster and am trying to add another node.  The new
>> node is running Rocky Linux and I'm getting the unable to gossip with
>> any peers error.  Firewall and SELinux are off.  I can ping all the
>> other nodes OK.  I've checked everything I can think of (/etc/hosts,
>> listen_address, broadcast etc..).  It all looks correct to me.
>> Any ideas?  Could it be an incompatibility with Rocky?
>>
>> DEBUG [main] 2021-09-10 06:45:24,846 YamlConfigurationLoader.java:112 -
>> Loading settings from file:/etc/cassandra/default.conf/cassandra.yaml
>> INFO  [Messaging-EventLoop-3-6] 2021-09-10 06:45:24,921
>> OutboundConnection.java:1150 -
>> /172.16.100.44:7000(/172.16.100.44:45934)->/172.16.100.253:7000-URGENT_MESSAGES-90efbb9e
>>
>> successfully connected, version = 12, framing = LZ4, encryption =
>> unencrypted
>> INFO  [Messaging-EventLoop-3-3] 2021-09-10 06:45:24,930
>> OutboundConnection.java:1150 -
>> /172.16.100.44:7000(/172.16.100.44:44320)->/172.16.100.37:7000-URGENT_MESSAGES-eae47864
>>
>> successfully connected, version = 12, framing = LZ4, encryption =
>> unencrypted
>> INFO  [ScheduledTasks:1] 2021-09-10 06:45:27,648 TokenMetadata.java:525
>> - Updating topology for all endpoints that have changed
>> DEBUG [OptionalTasks:1] 2021-09-10 06:45:54,644
>> SizeEstimatesRecorder.java:65 - Node is not part of the ring; not
>> recording size estimates
>> ERROR [main] 2021-09-10 06:46:25,891 CassandraDaemon.java:909 -
>> Exception encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>>  at
>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1805)
>>  at
>>
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:648)
>>  at
>>
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>>  at
>>
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>>  at
>>
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>>  at
>>
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>>  at
>>
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>>  at
>>
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
>> DEBUG [StorageServiceShutdownHook] 2021-09-10 06:46:25,896
>> StorageService.java:1621 - DRAINING: starting drain process
>> INFO  [StorageServiceShutdownHook] 2021-09-10 06:46:25,898
>> HintsService.java:220 - Paused hints dispatch
>> WARN  [StorageServiceShutdownHook] 2021-09-10 06:46:25,899
>> Gossiper.java:1993 - No local state, state is in silent shutdown, or
>> node hasn't joined, not announcing shutdown
>>
>> Thank you!
>>
>> -Joe
>>
>>
>
> 
>  Virus-free.
> www.avg.com
> 

Re: Unable to Gossip

2021-09-10 Thread Joe Obernberger

Thank you Jeff - yes, this is on the latest 4.0.1

nodetool version
ReleaseVersion: 4.0.1
nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load    Tokens  Owns (effective)  Host 
ID   Rack
UN  172.16.100.251  488.38 GiB  200 38.7% 
660f476c-a124-4ca0-b55f-75efe56370da  rack1
UN  172.16.100.208  76.02 GiB   30  5.8% 
2529b6ed-cdb2-43c2-bdd7-171cfe308bd3  rack1
UN  172.16.100.252  501.88 GiB  200 38.6% 
e83aa851-69b4-478f-88f6-60e657ea6539  rack1
UN  172.16.100.249  517.27 GiB  200 38.6% 
49e4f571-7d1c-4e1e-aca7-5bbe076596f7  rack1
UN  172.16.100.36   524.45 GiB  200 38.6% 
d9702f96-256e-45ae-8e12-69a42712be50  rack1
UN  172.16.100.39   521.05 GiB  200 38.6% 
93f9cb0f-ea71-4e3d-b62a-f0ea0e888c47  rack1
UN  172.16.100.253  11.39 GiB   4   0.8% 
a1a16910-9167-4174-b34b-eb859d36347e  rack1
UN  172.16.100.248  524.46 GiB  200 38.7% 
4bbbe57c-6219-41e5-bbac-de92a9594d53  rack1
UN  172.16.100.37   314.67 GiB  120 23.2% 
08a19658-40be-4e55-8709-812b3d4ac750  rack1
UN  172.16.100.250  464.23 GiB  200 38.6% 
b74b6e65-af63-486a-b07f-9e304ec30a39  rack1


yum list installed | grep cass
cassandra.noarch 4.0.1-1   @cassandra

-Joe

On 9/10/2021 10:54 AM, Jeff Jirsa wrote:
Is this on 4.0.0 ? 4.0.1 fixes an issue where the gossip result is too 
large for the urgent message queue, causing this stack trace, and was 
released 3 days ago. I've never seen it on a 10 node cluster before, 
but I'd be trying that.


On Fri, Sep 10, 2021 at 7:50 AM Joe Obernberger 
 wrote:


I have a 10 node cluster and am trying to add another node.  The new
node is running Rocky Linux and I'm getting the unable to gossip with
any peers error.  Firewall and SELinux are off.  I can ping all the
other nodes OK.  I've checked everything I can think of (/etc/hosts,
listen_address, broadcast etc..).  It all looks correct to me.
Any ideas?  Could it be an incompatibility with Rocky?

DEBUG [main] 2021-09-10 06:45:24,846
YamlConfigurationLoader.java:112 -
Loading settings from file:/etc/cassandra/default.conf/cassandra.yaml
INFO  [Messaging-EventLoop-3-6] 2021-09-10 06:45:24,921
OutboundConnection.java:1150 -
/172.16.100.44:7000(/172.16.100.44:45934)->/172.16.100.
253:7000-URGENT_MESSAGES-90efbb9e
successfully connected, version = 12, framing = LZ4, encryption =
unencrypted
INFO  [Messaging-EventLoop-3-3] 2021-09-10 06:45:24,930
OutboundConnection.java:1150 -
/172.16.100.44:7000(/172.16.100.44:44320)->/172.16.100.37
:7000-URGENT_MESSAGES-eae47864
successfully connected, version = 12, framing = LZ4, encryption =
unencrypted
INFO  [ScheduledTasks:1] 2021-09-10 06:45:27,648
TokenMetadata.java:525
- Updating topology for all endpoints that have changed
DEBUG [OptionalTasks:1] 2021-09-10 06:45:54,644
SizeEstimatesRecorder.java:65 - Node is not part of the ring; not
recording size estimates
ERROR [main] 2021-09-10 06:46:25,891 CassandraDaemon.java:909 -
Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
 at
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1805)
 at

org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:648)
 at

org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
 at

org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
 at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
 at

org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
 at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
DEBUG [StorageServiceShutdownHook] 2021-09-10 06:46:25,896
StorageService.java:1621 - DRAINING: starting drain process
INFO  [StorageServiceShutdownHook] 2021-09-10 06:46:25,898
HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2021-09-10 06:46:25,899
Gossiper.java:1993 - No local state, state is in silent shutdown, or
node hasn't joined, not announcing shutdown

Thank you!

-Joe


 
	Virus-free. www.avg.com 
 



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Re: Unable to Gossip

2021-09-10 Thread Jeff Jirsa
Is this on 4.0.0 ? 4.0.1 fixes an issue where the gossip result is too
large for the urgent message queue, causing this stack trace, and was
released 3 days ago. I've never seen it on a 10 node cluster before, but
I'd be trying that.

On Fri, Sep 10, 2021 at 7:50 AM Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> I have a 10 node cluster and am trying to add another node.  The new
> node is running Rocky Linux and I'm getting the unable to gossip with
> any peers error.  Firewall and SELinux are off.  I can ping all the
> other nodes OK.  I've checked everything I can think of (/etc/hosts,
> listen_address, broadcast etc..).  It all looks correct to me.
> Any ideas?  Could it be an incompatibility with Rocky?
>
> DEBUG [main] 2021-09-10 06:45:24,846 YamlConfigurationLoader.java:112 -
> Loading settings from file:/etc/cassandra/default.conf/cassandra.yaml
> INFO  [Messaging-EventLoop-3-6] 2021-09-10 06:45:24,921
> OutboundConnection.java:1150 -
> /172.16.100.44:7000(/172.16.100.44:45934)->/172.16.100.253:7000-URGENT_MESSAGES-90efbb9e
>
> successfully connected, version = 12, framing = LZ4, encryption =
> unencrypted
> INFO  [Messaging-EventLoop-3-3] 2021-09-10 06:45:24,930
> OutboundConnection.java:1150 -
> /172.16.100.44:7000(/172.16.100.44:44320)->/172.16.100.37:7000-URGENT_MESSAGES-eae47864
>
> successfully connected, version = 12, framing = LZ4, encryption =
> unencrypted
> INFO  [ScheduledTasks:1] 2021-09-10 06:45:27,648 TokenMetadata.java:525
> - Updating topology for all endpoints that have changed
> DEBUG [OptionalTasks:1] 2021-09-10 06:45:54,644
> SizeEstimatesRecorder.java:65 - Node is not part of the ring; not
> recording size estimates
> ERROR [main] 2021-09-10 06:46:25,891 CassandraDaemon.java:909 -
> Exception encountered during startup
> java.lang.RuntimeException: Unable to gossip with any peers
>  at
> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1805)
>  at
>
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:648)
>  at
>
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:934)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:784)
>  at
>
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
>  at
>
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
>  at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)
> DEBUG [StorageServiceShutdownHook] 2021-09-10 06:46:25,896
> StorageService.java:1621 - DRAINING: starting drain process
> INFO  [StorageServiceShutdownHook] 2021-09-10 06:46:25,898
> HintsService.java:220 - Paused hints dispatch
> WARN  [StorageServiceShutdownHook] 2021-09-10 06:46:25,899
> Gossiper.java:1993 - No local state, state is in silent shutdown, or
> node hasn't joined, not announcing shutdown
>
> Thank you!
>
> -Joe
>
>


Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-28 Thread Michael Carlise
For clarity for anybody that comes to this chain in the archive.  This
might be an issue with Ec2MultiRegionSnitch all together; not sure.  But if
I create a local 3 node cluster using ccm (cassandra v 3.11.4).  I can drop
the keystore/truststore jks files in, and flip encryption and everything
works as expected.  Tomorrow I'll reach out to the slack channel and see if
anybody can help/suggest ways to test it; or if anybody is aware of an
ongoing issue.

On Wed, Aug 28, 2019 at 2:49 PM Michael Carlise 
wrote:

> telnet from node 1 -> node2 7001 (and 7000) works.
>
> However, I can't rule out a JKS keystore/truststore issue.  I have tried a
> number of configurations and none of them have seemed to help (or emit any
> further error logging).   We have a root and intermediate CA cert, and a
> private key + signed CSR.  Our keystore has a single privateKeyentry of
> length 2: consisting of the signed CSR and the intermediate cert (in that
> order).  The truststore has a single entry of length one: consisting of the
> root cert used to issue the intermediate.  Does anybody know if that is the
> correct setup for JKS.  This setup was given to us by another team in our
> company that uses java much more than us.
>
> Some other points to note: Cassandra-9386 issue points out that 'dc'
> internode_encryption when using Ec2MultiRegionSnitch doesn't work correctly
> (always uses encrypted connections).  But I still can't get 'all' to work.
> The way I'm trying to get it to work is by just simply flipping encryption
> on in two non-seed nodes in the same datacenter.  I notice that in
> system.log I can see them both output the message 'Handshaking with
> /private IP'.  But then a few minutes later the unable to gossip exception
> is thrown.  No other information/logs are given; so I assume the handshake
> failed? presumably b/c incorrect truststore/keystore?
>
> I can't seem to find any concrete information about how to setup the
> keystore cert chain and/or the truststore. Does anybody know of any good
> sources on this topic, or know at the top of the minds how this setup is
> supposed to be?
>
>
> On Mon, Aug 26, 2019 at 10:01 PM Subroto Barua 
> wrote:
>
>> could be issue with keystore/trustore --- you may want to do keytool --
>> list  -- validate the files/password; also do md5sum on files from 1 node
>> in west and 1 node in east.
>> check ssl port 7001 --- from 1 node in west --> telnet > east>:7001 (or custom port if you are not using default port)
>>
>> On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
>>  wrote:
>>
>>
>> Subroto -
>>
>> both tools error; openssl errno 111 - which made me check bound ports on
>> the c* node with encryption flipped.  Port 9042 is not open (determined by
>> netstat -ant).  Looking at the log differences for when a node is started
>> with/without encryption.  Without encryption, I get a bunch of lines like:
>>
>> OutboundTcpConnection.java:561 - Handshaking version w/ IP
>>
>> And this happens after a line like
>>
>> Gossiper.java - Waiting for gossip to settle...
>>
>> with encryption toggled to 'dc', I don't see any of those lines;
>> presumable b/c the gossiper is trying to start but doesn't.
>>
>> On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua 
>> wrote:
>>
>> Michael,
>>
>> Are you able to connect to any c* node via OpenSSL?
>>
>> Openssl s_client -connect :9042
>>
>> Cqlsh  —ssl
>>
>> Subroto
>>
>> On Aug 26, 2019, at 2:47 PM, Marc Selwan 
>> wrote:
>>
>> which exact version of OpenJDK are you using? Is it possible you don't
>> have JCE on those nodes? (I believe more recent versions of Java 8 has this
>> baked in so that might not be it)
>>
>>
>> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
>> Twitter 
>>
>> *  Quick links | *DataStax  *| *Training
>>  *| *Documentation
>> 
>>  *| *Downloads 
>>
>>
>>
>> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
>> mcarl...@salesforce.com.invalid> wrote:
>>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> 
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node 

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-28 Thread Michael Carlise
telnet from node 1 -> node2 7001 (and 7000) works.

However, I can't rule out a JKS keystore/truststore issue.  I have tried a
number of configurations and none of them have seemed to help (or emit any
further error logging).   We have a root and intermediate CA cert, and a
private key + signed CSR.  Our keystore has a single privateKeyentry of
length 2: consisting of the signed CSR and the intermediate cert (in that
order).  The truststore has a single entry of length one: consisting of the
root cert used to issue the intermediate.  Does anybody know if that is the
correct setup for JKS.  This setup was given to us by another team in our
company that uses java much more than us.

Some other points to note: Cassandra-9386 issue points out that 'dc'
internode_encryption when using Ec2MultiRegionSnitch doesn't work correctly
(always uses encrypted connections).  But I still can't get 'all' to work.
The way I'm trying to get it to work is by just simply flipping encryption
on in two non-seed nodes in the same datacenter.  I notice that in
system.log I can see them both output the message 'Handshaking with
/private IP'.  But then a few minutes later the unable to gossip exception
is thrown.  No other information/logs are given; so I assume the handshake
failed? presumably b/c incorrect truststore/keystore?

I can't seem to find any concrete information about how to setup the
keystore cert chain and/or the truststore. Does anybody know of any good
sources on this topic, or know at the top of the minds how this setup is
supposed to be?


On Mon, Aug 26, 2019 at 10:01 PM Subroto Barua 
wrote:

> could be issue with keystore/trustore --- you may want to do keytool --
> list  -- validate the files/password; also do md5sum on files from 1 node
> in west and 1 node in east.
> check ssl port 7001 --- from 1 node in west --> telnet :7001
> (or custom port if you are not using default port)
>
> On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise
>  wrote:
>
>
> Subroto -
>
> both tools error; openssl errno 111 - which made me check bound ports on
> the c* node with encryption flipped.  Port 9042 is not open (determined by
> netstat -ant).  Looking at the log differences for when a node is started
> with/without encryption.  Without encryption, I get a bunch of lines like:
>
> OutboundTcpConnection.java:561 - Handshaking version w/ IP
>
> And this happens after a line like
>
> Gossiper.java - Waiting for gossip to settle...
>
> with encryption toggled to 'dc', I don't see any of those lines;
> presumable b/c the gossiper is trying to start but doesn't.
>
> On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua 
> wrote:
>
> Michael,
>
> Are you able to connect to any c* node via OpenSSL?
>
> Openssl s_client -connect :9042
>
> Cqlsh  —ssl
>
> Subroto
>
> On Aug 26, 2019, at 2:47 PM, Marc Selwan  wrote:
>
> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter 
>
> *  Quick links | *DataStax  *| *Training
>  *| *Documentation
> 
>  *| *Downloads 
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
> mcarl...@salesforce.com.invalid> wrote:
>
>
> I originally opened this issue on stackoverflow (
> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
> 
> ).
>
> However, I haven't gotten any responses in over a week.  I'm going to post
> it here and maybe someone will have an idea on where I can look.
>
> We currently run a multi region cassandra cluster in AWS. It runs in four
> regions, 12 nodes per region. It runs without node to node encryption (or
> client encryption either). We are trying to enable inter datacenter node to
> node encryption. However, when we flip encryption over we get an exception
> that nodes are unable to gossip with any peers.
>
> It could possibly be that we didn't build our jks keystore/truststores
> correctly (more on how we built these files below). But, we additionally do
> not see intra datacenter communication working (which should be set to
> unencrypted communication). Additionally, cqlsh cannot connect to the node
> either; even though we have (by default) client_auth_required set to false
> .
>
> ERROR [main] 2019-08-15 

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Subroto Barua
 could be issue with keystore/trustore --- you may want to do keytool -- list  
-- validate the files/password; also do md5sum on files from 1 node in west and 
1 node in east.check ssl port 7001 --- from 1 node in west --> telnet :7001 (or custom port if you are not using default port)
On Monday, August 26, 2019, 05:46:19 PM PDT, Michael Carlise 
 wrote:  
 
 Subroto -
both tools error; openssl errno 111 - which made me check bound ports on the c* 
node with encryption flipped.  Port 9042 is not open (determined by netstat 
-ant).  Looking at the log differences for when a node is started with/without 
encryption.  Without encryption, I get a bunch of lines like:
OutboundTcpConnection.java:561 - Handshaking version w/ IP
And this happens after a line like
Gossiper.java - Waiting for gossip to settle...
with encryption toggled to 'dc', I don't see any of those lines; presumable b/c 
the gossiper is trying to start but doesn't.
On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua  
wrote:

Michael,
Are you able to connect to any c* node via OpenSSL?
Openssl s_client -connect :9042
Cqlsh  —ssl 
Subroto 
On Aug 26, 2019, at 2:47 PM, Marc Selwan  wrote:


which exact version of OpenJDK are you using? Is it possible you don't have JCE 
on those nodes? (I believe more recent versions of Java 8 has this baked in so 
that might not be it)

Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter 
  Quick links | DataStax | Training | Documentation | Downloads  



On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise 
 wrote:


I originally opened this issue on stackoverflow 
(https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).
  
However, I haven't gotten any responses in over a week.  I'm going to post it 
here and maybe someone will have an idea on where I can look.

We currently run a multi region cassandra cluster in AWS. It runs in four 
regions, 12 nodes per region. It runs without node to node encryption (or 
client encryption either). We are trying to enable inter datacenter node to 
node encryption. However, when we flip encryption over we get an exception that 
nodes are unable to gossip with any peers.

It could possibly be that we didn't build our jks keystore/truststores 
correctly (more on how we built these files below). But, we additionally do not 
see intra datacenter communication working (which should be set to unencrypted 
communication). Additionally, cqlsh cannot connect to the node either; even 
though we have (by default) client_auth_required set to false.
ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
encountered during startup
java.lang.RuntimeException: Unable to gossip with any peers
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
 ~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:683) 
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:632) 
~[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620) 
[apache-cassandra-3.11.4.jar:3.11.4]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
[apache-cassandra-3.11.4.jar:3.11.4]
INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
Configuration location: file:/etc/cassandra/cassandra.yaml


Something to note is that this error message occurs after a few minutes of the 
node being up. (i.e. there is a delay between start up before this exception is 
thrown).

Information about our cassandra setup

cassandra version: 3.11.4
JDK version: openjdk-8.
Linux: Ubuntu 18.04 (bionic).

cassandra.yaml
endpoint_snitch: Ec2MultiRegionSnitch

server_encryption_options:
  internode_encryption: dc
  keystore: 
  keystore_password: 
  truststore: 
  truststore_password: 

client_encryption_options:
  enabled: false

cassandra-rackdc.properties
prefer_local=true

No obvious errors with SSH output

When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added 
to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject and Issuer 
were omitted on purpose).
found key for : cassy-us-west-2 

  
adding as trusted cert:

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Michael Carlise
Subroto -

both tools error; openssl errno 111 - which made me check bound ports on
the c* node with encryption flipped.  Port 9042 is not open (determined by
netstat -ant).  Looking at the log differences for when a node is started
with/without encryption.  Without encryption, I get a bunch of lines like:

OutboundTcpConnection.java:561 - Handshaking version w/ IP

And this happens after a line like

Gossiper.java - Waiting for gossip to settle...

with encryption toggled to 'dc', I don't see any of those lines; presumable
b/c the gossiper is trying to start but doesn't.

On Mon, Aug 26, 2019 at 6:51 PM Subroto Barua 
wrote:

> Michael,
>
> Are you able to connect to any c* node via OpenSSL?
>
> Openssl s_client -connect :9042
>
> Cqlsh  —ssl
>
> Subroto
>
> On Aug 26, 2019, at 2:47 PM, Marc Selwan  wrote:
>
> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter 
>
> *  Quick links | *DataStax  *| *Training
>  *| *Documentation
> 
>  *| *Downloads 
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise <
> mcarl...@salesforce.com.invalid> wrote:
>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> 
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
>> encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>> at 
>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
>> ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
>>  [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
>> Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a few minutes
>> of the node being up. (i.e. there is a delay between start up before this
>> exception is thrown).
>>
>> *Information about our cassandra setup*
>>
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>>
>> *cassandra.yaml*
>>
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: 
>>   keystore_password: 
>>   truststore: 
>>   truststore_password: 
>>
>> client_encryption_options:
>>   enabled: false
>>
>> 

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Michael Carlise
The version given by apt is 8u162-b12-1.  Which I think corresponds to
openJDK-8-162.  When I run jrunscript -e 'print
(javax.crypto.Cipher.getMaxAllowedKeyLength("RC5") >= 256);' the command
returns true.  Not sure if that is the best way to verify JCE installed.


Michael Carlise

On Mon, Aug 26, 2019 at 5:47 PM Marc Selwan 
wrote:

> which exact version of OpenJDK are you using? Is it possible you don't
> have JCE on those nodes? (I believe more recent versions of Java 8 has this
> baked in so that might not be it)
>
>
> *Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
> Twitter 
>
> *  Quick links | *DataStax  *| *Training
>  *| *Documentation
> 
>  *| *Downloads 
>
>
>
> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise
>  wrote:
>
>>
>> I originally opened this issue on stackoverflow (
>> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
>> 
>> ).
>>
>> However, I haven't gotten any responses in over a week.  I'm going to
>> post it here and maybe someone will have an idea on where I can look.
>>
>> We currently run a multi region cassandra cluster in AWS. It runs in four
>> regions, 12 nodes per region. It runs without node to node encryption (or
>> client encryption either). We are trying to enable inter datacenter node to
>> node encryption. However, when we flip encryption over we get an exception
>> that nodes are unable to gossip with any peers.
>>
>> It could possibly be that we didn't build our jks keystore/truststores
>> correctly (more on how we built these files below). But, we additionally do
>> not see intra datacenter communication working (which should be set to
>> unencrypted communication). Additionally, cqlsh cannot connect to the node
>> either; even though we have (by default) client_auth_required set to
>> false.
>>
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
>> encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>> at 
>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
>> ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
>>  [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
>> Configuration location: file:/etc/cassandra/cassandra.yaml
>>
>>
>> Something to note is that this error message occurs after a few minutes
>> of the node being up. (i.e. there is a delay between start up before this
>> exception is thrown).
>>
>> *Information about our cassandra setup*
>>
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>>
>> *cassandra.yaml*
>>
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: 
>>   keystore_password: 
>>   truststore: 
>>   truststore_password: 
>>
>> client_encryption_options:
>>   enabled: false
>>
>> *cassandra-rackdc.properties*
>>
>> prefer_local=true
>>
>> *No obvious errors with SSH output*
>>
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
>> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject
>> and Issuer were omitted on purpose)*.
>>
>> found key for : cassy-us-west-2
>> adding as trusted cert:
>>   Subject: ...
>>   Issuer:  ...
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Subroto Barua
Michael,

Are you able to connect to any c* node via OpenSSL?

Openssl s_client -connect :9042

Cqlsh  —ssl 

Subroto 

> On Aug 26, 2019, at 2:47 PM, Marc Selwan  wrote:
> 
> which exact version of OpenJDK are you using? Is it possible you don't have 
> JCE on those nodes? (I believe more recent versions of Java 8 has this baked 
> in so that might not be it)
> 
> 
> Marc Selwan | DataStax | PM, Server Team | (925) 413-7079 | Twitter 
> 
>   Quick links | DataStax | Training | Documentation | Downloads  
> 
> 
> 
>> On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise 
>>  wrote:
>> 
>> I originally opened this issue on stackoverflow 
>> (https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception).
>>   
>> 
>> However, I haven't gotten any responses in over a week.  I'm going to post 
>> it here and maybe someone will have an idea on where I can look.
>> 
>> We currently run a multi region cassandra cluster in AWS. It runs in four 
>> regions, 12 nodes per region. It runs without node to node encryption (or 
>> client encryption either). We are trying to enable inter datacenter node to 
>> node encryption. However, when we flip encryption over we get an exception 
>> that nodes are unable to gossip with any peers.
>> 
>> It could possibly be that we didn't build our jks keystore/truststores 
>> correctly (more on how we built these files below). But, we additionally do 
>> not see intra datacenter communication working (which should be set to 
>> unencrypted communication). Additionally, cqlsh cannot connect to the node 
>> either; even though we have (by default) client_auth_required set to false.
>> 
>> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
>> encountered during startup
>> java.lang.RuntimeException: Unable to gossip with any peers
>> at 
>> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
>> ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
>>  ~[apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
>>  [apache-cassandra-3.11.4.jar:3.11.4]
>> at 
>> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
>> [apache-cassandra-3.11.4.jar:3.11.4]
>> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
>> Configuration location: file:/etc/cassandra/cassandra.yaml
>> 
>> Something to note is that this error message occurs after a few minutes of 
>> the node being up. (i.e. there is a delay between start up before this 
>> exception is thrown).
>> 
>> Information about our cassandra setup
>> 
>> cassandra version: 3.11.4
>> JDK version: openjdk-8.
>> Linux: Ubuntu 18.04 (bionic).
>> 
>> cassandra.yaml
>> 
>> endpoint_snitch: Ec2MultiRegionSnitch
>> 
>> server_encryption_options:
>>   internode_encryption: dc
>>   keystore: 
>>   keystore_password: 
>>   truststore: 
>>   truststore_password: 
>> 
>> client_encryption_options:
>>   enabled: false
>> cassandra-rackdc.properties
>> 
>> prefer_local=true
>> No obvious errors with SSH output
>> 
>> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" 
>> added to cassandra-env.sh we see SSL logs printed to stdout (Note: Subject 
>> and Issuer were omitted on purpose).
>> 
>> found key for : cassy-us-west-2  
>>  
>> 
>> adding as trusted cert:  
>>  
>> 
>>   Subject: ...   
>>  
>>   
>>   Issuer:  ...   
>>  
>>   
>>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74   
>>

Re: unable to gossip with peers exception when internode encryption is set to any setting other than 'none'

2019-08-26 Thread Marc Selwan
which exact version of OpenJDK are you using? Is it possible you don't have
JCE on those nodes? (I believe more recent versions of Java 8 has this
baked in so that might not be it)


*Marc Selwan | *DataStax *| *PM, Server Team *|* *(925) 413-7079* *|*
Twitter 

*  Quick links | *DataStax  *| *Training
 *| *Documentation

 *| *Downloads 



On Mon, Aug 26, 2019 at 1:56 PM Michael Carlise
 wrote:

>
> I originally opened this issue on stackoverflow (
> https://stackoverflow.com/questions/57516660/cassandra-node-to-node-encryption-throws-unable-to-gossip-with-peers-exception
> 
> ).
>
> However, I haven't gotten any responses in over a week.  I'm going to post
> it here and maybe someone will have an idea on where I can look.
>
> We currently run a multi region cassandra cluster in AWS. It runs in four
> regions, 12 nodes per region. It runs without node to node encryption (or
> client encryption either). We are trying to enable inter datacenter node to
> node encryption. However, when we flip encryption over we get an exception
> that nodes are unable to gossip with any peers.
>
> It could possibly be that we didn't build our jks keystore/truststores
> correctly (more on how we built these files below). But, we additionally do
> not see intra datacenter communication working (which should be set to
> unencrypted communication). Additionally, cqlsh cannot connect to the node
> either; even though we have (by default) client_auth_required set to false
> .
>
> ERROR [main] 2019-08-15 18:46:32,241 CassandraDaemon.java:749 - Exception 
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any peers
> at 
> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1435) 
> ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:566)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:823)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:683)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:632)
>  ~[apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:388) 
> [apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:620)
>  [apache-cassandra-3.11.4.jar:3.11.4]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:732) 
> [apache-cassandra-3.11.4.jar:3.11.4]
> INFO  [main] 2019-08-15 18:47:07,384 YamlConfigurationLoader.java:89 - 
> Configuration location: file:/etc/cassandra/cassandra.yaml
>
>
> Something to note is that this error message occurs after a few minutes of
> the node being up. (i.e. there is a delay between start up before this
> exception is thrown).
>
> *Information about our cassandra setup*
>
> cassandra version: 3.11.4
> JDK version: openjdk-8.
> Linux: Ubuntu 18.04 (bionic).
>
> *cassandra.yaml*
>
> endpoint_snitch: Ec2MultiRegionSnitch
>
> server_encryption_options:
>   internode_encryption: dc
>   keystore: 
>   keystore_password: 
>   truststore: 
>   truststore_password: 
>
> client_encryption_options:
>   enabled: false
>
> *cassandra-rackdc.properties*
>
> prefer_local=true
>
> *No obvious errors with SSH output*
>
> When starting cassandra with JVM_OPTS="$JVM_OPTS -Djavax.net.debug=ssl" added
> to cassandra-env.sh we see SSL logs printed to stdout (*Note: Subject and
> Issuer were omitted on purpose)*.
>
> found key for : cassy-us-west-2
> adding as trusted cert:
>   Subject: ...
>   Issuer:  ...
>   Algorithm: RSA; Serial number: 0xdad28d843fc73325d4c1a75207d4e74
>   Valid from Fri May 27 00:00:00 UTC 2016 until Tue May 26 23:59:59 UTC 2026
>
> ...
>
> trigger seeding of SecureRandom
> done seeding SecureRandom
>
> Looking at Java SE SSL/TLS connection debugging
> ,
> this looks correct. 

Re: Unable to gossip with any seeds

2014-02-03 Thread Chiranjeevi Ravilla
Hi Sundeep,

Can you please confirm, are you configuring two nodes in different Datacenters?

If you are configuring on single datacenter with two nodes,then please change 
the endpoint_snitch  from  RackInferringSnitch to SimpleSnitch and restart the 
clusters.

Regards,
Chiru

On 03-Feb-2014, at 2:17 PM, Sundeep Kambhampati satyasunde...@gmail.com wrote:

 Hi,
 
 I am trying to setup multi-node Cassandra cluster (2 nodes). I am using 
 apache-cassandra-2.0.4. I am able to start Cassandra on the seed node. But, 
 when I am trying to start it on the other node it starts and fails in few 
 seconds. I can see the following in my error log: 
 
 ERROR 03:23:56,915 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds 
 
 
 I am able to telnet from node 1 to node 0.
 
 telnet 10.2.252.0 9000
 Trying 10.2.252.0...
 Connected to 10.2.252.0.
 Escape character is '^]'.
 ^]
 Connection closed by foreign host.
  
 
 cassandra.yaml
 
 node 0: (sk.r252.0)(seed) 
 
 cluster_name: 'DataCluster'
 num_tokens: 256
 initial_token: 0
 seeds: sk.r252.0
 storage_port: 9000
 ssl_storage_port: 9001
 listen_address: sk.r252.0
 rpc_address: 0.0.0.0
 rpc_port: 8192
 endpoint_snitch: RackInferringSnitch
 
 node 1: (sk.r252.1) 
 cluster_name: 'DataCluster'
 num_tokens: 256
 initial_token: 4611686018427387904
 seeds: sk.r252.0
 storage_port: 9000
 ssl_storage_port: 9001
 listen_address: sk.r252.1
 rpc_address: 0.0.0.0
 rpc_port: 8192
 endpoint_snitch: RackInferringSnitch
 
 
 When I am trying to start Cassandra on node 1 it fails and the log shows:
 
  INFO 03:23:25,284 Loading persisted ring state
  INFO 03:23:25,564 Starting Messaging Service on port 9000
  INFO 03:23:25,797 Handshaking version with sk.r252.0/10.2.252.0
 ERROR 03:23:56,915 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
 at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
 at 
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
 java.lang.RuntimeException: Unable to gossip with any seeds
 at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
 at 
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
 at 
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
 at 
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
 Exception encountered during startup: Unable to gossip with any seeds
 ERROR 03:24:02,213 Exception in thread 
 Thread[StorageServiceShutdownHook,5,main]
 java.lang.NullPointerException
 at 
 org.apache.cassandra.service.StorageService.stopNativeTransport(StorageService.java:349)
 at 
 org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:364)
 at 
 org.apache.cassandra.service.StorageService.access$000(StorageService.java:97)
 at 
 org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:551)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Unknown Source)
 
 bin/nodetool status
 
 
 Datacenter: 0
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Owns (effective)  Host ID
TokenRack
 UN  10.0.2.1  35.8 KB100.0%
 0ae766db-fcb5-481b-8120-550b672fa9e7  0   
  2
 
 
 Can some please help me fixing this error? 
 
 
 Thank you,
 Sundeep
 
 



Re: Unable to gossip with any seeds

2014-02-03 Thread Sundeep Kambhampati
Thank you Chiru for the reply. I am configuring single datacenter. I
changed it to SimpleSnitch. However, I am getting the same error.

-Sundeep


On Mon, Feb 3, 2014 at 3:58 AM, Chiranjeevi Ravilla rccassandr...@gmail.com
 wrote:

 Hi Sundeep,

 Can you please confirm, are you configuring two nodes in different
 Datacenters?

 If you are configuring on single datacenter with two nodes,then please
 change the endpoint_snitch  from  RackInferringSnitch to SimpleSnitch and
 restart the clusters.

 Regards,
 Chiru

 On 03-Feb-2014, at 2:17 PM, Sundeep Kambhampati satyasunde...@gmail.com
 wrote:

 Hi,

 I am trying to setup multi-node Cassandra cluster (2 nodes). I am
 using apache-cassandra-2.0.4. I am able to start Cassandra on the seed
 node. But, when I am trying to start it on the other node it starts and
 fails in few seconds. I can see the following in my error log:

 ERROR 03:23:56,915 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds 


 I am able to telnet from node 1 to node 0.

 telnet 10.2.252.0 9000
 Trying 10.2.252.0...
 Connected to 10.2.252.0.
 Escape character is '^]'.
 ^]
 Connection closed by foreign host.


 *cassandra.yaml*

 node 0: (sk.r252.0)(seed)

 cluster_name: 'DataCluster'
 num_tokens: 256
 initial_token: 0
 seeds: sk.r252.0
 storage_port: 9000
 ssl_storage_port: 9001
 listen_address: sk.r252.0
 rpc_address: 0.0.0.0
 rpc_port: 8192
 endpoint_snitch: RackInferringSnitch

 node 1: (sk.r252.1)
 cluster_name: 'DataCluster'
 num_tokens: 256
 initial_token: 4611686018427387904
 seeds: sk.r252.0
 storage_port: 9000
 ssl_storage_port: 9001
 listen_address: sk.r252.1
 rpc_address: 0.0.0.0
 rpc_port: 8192
 endpoint_snitch: RackInferringSnitch


 When I am trying to start Cassandra on node 1 it fails and the log shows:

  INFO 03:23:25,284 Loading persisted ring state
  INFO 03:23:25,564 Starting Messaging Service on port 9000
  INFO 03:23:25,797 Handshaking version with sk.r252.0/10.2.252.0
 ERROR 03:23:56,915 Exception encountered during startup
 java.lang.RuntimeException: Unable to gossip with any seeds
 at
 org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
 at
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
 at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
  at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
 java.lang.RuntimeException: Unable to gossip with any seeds
 at
 org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
 at
 org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
 at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:504)
 Exception encountered during startup: Unable to gossip with any seeds
 ERROR 03:24:02,213 Exception in thread
 Thread[StorageServiceShutdownHook,5,main]
 java.lang.NullPointerException
 at
 org.apache.cassandra.service.StorageService.stopNativeTransport(StorageService.java:349)
 at
 org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:364)
 at
 org.apache.cassandra.service.StorageService.access$000(StorageService.java:97)
 at
 org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:551)
 at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
 at java.lang.Thread.run(Unknown Source)

 *bin/nodetool status*


 Datacenter: 0
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Owns (effective)  Host ID
   TokenRack
 UN  10.0.2.1  35.8 KB100.0%
  0ae766db-fcb5-481b-8120-550b672fa9e7  0
  2


 Can some please help me fixing this error?


 Thank you,
 Sundeep