[
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166908#comment-14166908
]
J.B. Langston edited comment on CASSANDRA-8084 at 10/10/14 2:26 PM:
--------------------------------------------------------------------
I tested and it appears to work. Here is the cluster I am testing with:
{code}
Datacenter: DC1_EAST
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 54.165.222.3 711.26 MB 1 25.0%
dd449706-2059-4b65-ae98-0012d2cf8f67 rack1
UN 54.172.118.222 561.14 MB 1 25.0%
18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef rack1
Datacenter: DC1_WEST
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 54.183.192.248 721.2 MB 1 25.0%
c4dd37f1-d937-4876-8669-f0b01a3942db rack1
UN 54.215.139.161 909.26 MB 1 25.0%
16499349-8cef-4a62-a99c-ab145cb70921 rack1
{code}
I wasn't sure initially because the logs and `nodetool netstats` still show the
broadcast address. You can see here that nodetool netstats, when run on
54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast
address of the other node in the same DC):
{code}
Mode: NORMAL
Repair dbc7ea40-5082-11e4-8190-c9fac3589773
/54.183.192.248
Receiving 9 files, 229856794 bytes total
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db
58878176/58878176 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db
97856/97856 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db
69407704/69407704 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db
3203116/3203116 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db
12545306/12545306 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db
69407704/69407704 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db
1536228/1536228 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db
12589230/12589230 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db
2191474/2191474 bytes(100%) received from /54.183.192.248
Sending 5 files, 109645980 bytes total
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db
14323672/14323672 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db
20581730/20581730 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db
3161694/3161694 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db
69407704/69407704 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db
2171180/2171180 bytes(100%) sent to /54.183.192.248
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 1495191
Responses n/a 0 714928
{code}
However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we
are only connecting to the local node on its listen address (172.31.7.50):
{code}
tcp 0 0 172.31.5.143:7000 0.0.0.0:* LISTEN
17279/java
tcp 0 0 172.31.5.143:7000 172.31.5.143:34936 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.5.143:34937 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.5.143:34938 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:34936 172.31.5.143:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:34937 172.31.5.143:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:34938 172.31.5.143:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.7.50:52125 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.7.50:52126 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57502 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57560 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57601 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57602 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.165.222.3:33876 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.165.222.3:33878 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:44120 54.165.222.3:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:44198 54.165.222.3:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.172.118.222:54515 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.172.118.222:54518 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:35960 54.172.118.222:7000 ESTABLISHED
17279/java
tcp 0 161 172.31.5.143:35880 54.172.118.222:7000 ESTABLISHED
17279/java
unix 2 [ ] DGRAM 7000 613/acpid
{code}
The only connections established to the broadcast addresses are for the nodes
in the other DC (54.165.222.3 and 54.172.118.222).
Is use of the broadcast address in netstats and the logs intentional? I can see
some customers getting confused by this. On the other hand, it matches what we
show for nodetool ring and status, so I could see arguments both ways.
was (Author: [email protected]):
I tested and it appears to work. Here is the cluster I am testing with:
{code}
Datacenter: DC1_EAST
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 54.165.222.3 711.26 MB 1 25.0%
dd449706-2059-4b65-ae98-0012d2cf8f67 rack1
UN 54.172.118.222 561.14 MB 1 25.0%
18cd7d0a-74ca-4835-a7ff-7ffaa92b35ef rack1
Datacenter: DC1_WEST
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 54.183.192.248 721.2 MB 1 25.0%
c4dd37f1-d937-4876-8669-f0b01a3942db rack1
UN 54.215.139.161 909.26 MB 1 25.0%
16499349-8cef-4a62-a99c-ab145cb70921 rack1
{code}
I wasn't sure initially because the logs and `nodetool netstats` still show the
broadcast address. You can see here that nodetool netstats, when run on
54.215.139.161, shows we are streaming from 54.183.192.248 (the broadcast
address of the other node in the same DC):
{code}
Mode: NORMAL
Repair dbc7ea40-5082-11e4-8190-c9fac3589773
/54.183.192.248
Receiving 9 files, 229856794 bytes total
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-100-Data.db
58878176/58878176 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-106-Data.db
97856/97856 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-109-Data.db
69407704/69407704 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-108-Data.db
3203116/3203116 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-102-Data.db
12545306/12545306 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-103-Data.db
69407704/69407704 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-104-Data.db
1536228/1536228 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-105-Data.db
12589230/12589230 bytes(100%) received from /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-107-Data.db
2191474/2191474 bytes(100%) received from /54.183.192.248
Sending 5 files, 109645980 bytes total
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-87-Data.db
14323672/14323672 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-97-Data.db
20581730/20581730 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-98-Data.db
3161694/3161694 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-95-Data.db
69407704/69407704 bytes(100%) sent to /54.183.192.248
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-99-Data.db
2171180/2171180 bytes(100%) sent to /54.183.192.248
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 1495191
Responses n/a 0 714928
{code}
However, the output of `sudo netstat -anp | grep 7000 | sort -k5` shows that we
are only connecting to the local node on its listen address (172.31.7.50):
{code}
tcp 0 0 172.31.5.143:7000 0.0.0.0:* LISTEN
17279/java
tcp 0 0 172.31.5.143:7000 172.31.5.143:34936 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.5.143:34937 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.5.143:34938 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:34936 172.31.5.143:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:34937 172.31.5.143:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:34938 172.31.5.143:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.7.50:52125 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 172.31.7.50:52126 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57502 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57560 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57601 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:57602 172.31.7.50:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.165.222.3:33876 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.165.222.3:33878 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:44120 54.165.222.3:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:44198 54.165.222.3:7000 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.172.118.222:54515 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:7000 54.172.118.222:54518 ESTABLISHED
17279/java
tcp 0 0 172.31.5.143:35960 54.172.118.222:7000 ESTABLISHED
17279/java
tcp 0 161 172.31.5.143:35880 54.172.118.222:7000 ESTABLISHED
17279/java
unix 2 [ ] DGRAM 7000 613/acpid
{code}
The only connections established to the broadcast addresses are for the nodes
in the other DC (54.165.222.3 and 54.172.118.222).
Is use of the broadcast address in netstats and the logs intentional? I can see
some customers getting confused by this. On the other hand, it matches what we
show for nodetool ring and status, so...
> GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE
> clusters doesnt use the PRIVATE IPS for Intra-DC communications - When
> running nodetool repair
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-8084
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
> Project: Cassandra
> Issue Type: Bug
> Components: Config
> Environment: Tested this in GCE and AWS clusters. Created multi
> region and multi dc cluster once in GCE and once in AWS and ran into the same
> problem.
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.3 LTS"
> NAME="Ubuntu"
> VERSION="12.04.3 LTS, Precise Pangolin"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu precise (12.04.3 LTS)"
> VERSION_ID="12.04"
> Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also
> latest DSE version which is 4.5 and which corresponds to 2.0.8.39.
> Reporter: Jana
> Assignee: Yuki Morishita
> Labels: features
> Fix For: 2.0.11
>
> Attachments: 8084-2.0.txt
>
>
> Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch )
> used the PRIVATE IPS for communication between INTRA-DC nodes in my
> multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran
> "nodetool repair -local". It works fine during regular reads.
> Here are the various cluster flavors I tried and failed-
> AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch +
> (Prefer_local=true) in rackdc-properties file.
> AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in
> rackdc-properties file.
> GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch +
> (Prefer_local=true) in rackdc-properties file.
> GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in
> rackdc-properties file.
> I am expecting with the above setup all of my nodes in a given DC all
> communicate via private ips since the cloud providers dont charge us for
> using the private ips and they charge for using public ips.
> But they can use PUBLIC IPs for INTER-DC communications which is working as
> expected.
> Here is a snippet from my log files when I ran the "nodetool repair -local" -
> Node responding to 'node running repair'
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254)
> [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree
> to /54.172.118.222 for system_traces/sessions
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254)
> [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree
> to /54.172.118.222 for system_traces/events
> Node running repair -
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line
> 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for
> events from /54.172.118.222
> Note: The IPs its communicating is all PUBLIC Ips and it should have used the
> PRIVATE IPs starting with 172.x.x.x
> YAML file values :
> The listen address is set to: PRIVATE IP
> The broadcast address is set to: PUBLIC IP
> The SEEDs address is set to: PUBLIC IPs from both DCs
> The SNITCHES tried: GPFS and EC2MultiRegionSnitch
> RACK-DC: Had prefer_local set to true.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)