[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair

J.B. Langston (JIRA) Fri, 17 Oct 2014 14:44:00 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175588#comment-14175588
 ]


J.B. Langston commented on CASSANDRA-8084:
------------------------------------------

I don't think sstableloader is working right. Here is the output for 
sstableloader itself:

{code}
automaton@ip-172-31-7-50:~/Keyspace1/Standard1$ sstableloader -d localhost `pwd`
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-320-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-326-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-325-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-283-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-267-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-211-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-301-Data.db 
/home/automaton/Keyspace1/Standard1/Keyspace1-Standard1-jb-316-Data.db to 
[/54.183.192.248, /54.215.139.161, /54.165.222.3, /54.172.118.222]
Streaming session ID: ac5dd440-5645-11e4-a813-3d13c3d3c540
progress: [/54.172.118.222 8/8 (100%)] [/54.183.192.248 8/8 (100%)] 
[/54.165.222.3 8/8 (100%)] [/54.215.139.161 8/8 (100%)] [total: 100% - 
2147483647MB/s (avg: 30MB/s)
{code}

Here is netstats on the node where it is running:

{code}
Responses                       n/a         0            812
automaton@ip-172-31-7-50:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
    /172.31.7.50 (using /54.183.192.248)
        Receiving 8 files, 1059673728 bytes total
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-10-Data.db
 56468194/164372226 bytes(34%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-4-Data.db
 278000000/278000000 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-3-Data.db
 50674396/50674396 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-5-Data.db
 68597334/68597334 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-7-Data.db
 139068110/139068110 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-6-Data.db
 12682638/12682638 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-9-Data.db
 278000000/278000000 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-8-Data.db
 68279024/68279024 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0              0
Responses                       n/a         0            970
{code}

Here's netstats on the other node in the same DC:

{code}
automaton@ip-172-31-40-169:~$ nodetool netstats
Mode: NORMAL
Bulk Load ac5dd440-5645-11e4-a813-3d13c3d3c540
    /172.31.7.50 (using /54.183.192.248)
        Receiving 8 files, 1059673728 bytes total
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-239-Data.db
 68279024/68279024 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-245-Data.db
 278000000/278000000 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-246-Data.db
 43078602/50674396 bytes(85%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-240-Data.db
 278000000/278000000 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-241-Data.db
 12682638/12682638 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-243-Data.db
 139068110/139068110 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-242-Data.db
 164372226/164372226 bytes(100%) received from /172.31.7.50
            
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-244-Data.db
 68597334/68597334 bytes(100%) received from /172.31.7.50
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         249589
Responses                       n/a         0        1390344
{code}

The IP addresses seem backwards in netstats output.

Here is the output of netstat -anp | grep 7000 on the node where sstableloader 
is running:

{code}
tcp        0      0 172.31.7.50:7000        0.0.0.0:*               LISTEN      
21544/java
tcp        0      0 172.31.7.50:7000        172.31.5.143:44869      ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:56991       172.31.5.143:7000       ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:7000        54.165.222.3:50968      ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:50599       54.165.222.3:7000       ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:50624       54.165.222.3:7000       ESTABLISHED 
22226/java
tcp        0 1132336 172.31.7.50:50626       54.165.222.3:7000       
ESTABLISHED 22226/java
tcp        0      0 172.31.7.50:7000        54.172.118.222:58561    ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:37769       54.172.118.222:7000     ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:37796       54.172.118.222:7000     ESTABLISHED 
22226/java
tcp        0 1149712 172.31.7.50:37798       54.172.118.222:7000     
ESTABLISHED 22226/java
tcp        0      0 172.31.7.50:7000        54.183.192.248:47451    ESTABLISHED 
21544/java
tcp    43688      0 172.31.7.50:7000        54.183.192.248:47453    ESTABLISHED 
21544/java
tcp        0      0 172.31.7.50:47451       54.183.192.248:7000     ESTABLISHED 
22226/java
tcp        0  98464 172.31.7.50:47453       54.183.192.248:7000     ESTABLISHED 
22226/java
tcp        0      0 172.31.7.50:41240       54.215.139.161:7000     ESTABLISHED 
22226/java
tcp        0  81088 172.31.7.50:41242       54.215.139.161:7000     ESTABLISHED 
22226/java
{code}

It's establishing a connection to itself (54.183.192.248) and to the other node 
in the local DC (54.215.139.161) with the broadcast address instead of the 
listen address.

> GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE 
> clusters doesnt use the PRIVATE IPS for Intra-DC communications - When 
> running nodetool repair
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8084
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8084
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Config
>         Environment: Tested this in GCE and AWS clusters. Created multi 
> region and multi dc cluster once in GCE and once in AWS and ran into the same 
> problem. 
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=12.04
> DISTRIB_CODENAME=precise
> DISTRIB_DESCRIPTION="Ubuntu 12.04.3 LTS"
> NAME="Ubuntu"
> VERSION="12.04.3 LTS, Precise Pangolin"
> ID=ubuntu
> ID_LIKE=debian
> PRETTY_NAME="Ubuntu precise (12.04.3 LTS)"
> VERSION_ID="12.04"
> Tried to install Apache Cassandra version ReleaseVersion: 2.0.10 and also 
> latest DSE version which is 4.5 and which corresponds to 2.0.8.39.
>            Reporter: Jana
>            Assignee: Yuki Morishita
>              Labels: features
>             Fix For: 2.0.12
>
>         Attachments: 8084-2.0-v2.txt, 8084-2.0-v3.txt, 8084-2.0-v4.txt, 
> 8084-2.0.txt
>
>
> Neither of these snitches(GossipFilePropertySnitch and EC2MultiRegionSnitch ) 
> used the PRIVATE IPS for communication between INTRA-DC nodes in my 
> multi-region multi-dc cluster in cloud(on both AWS and GCE) when I ran 
> "nodetool repair -local". It works fine during regular reads.
>  Here are the various cluster flavors I tried and failed- 
> AWS + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
> (Prefer_local=true) in rackdc-properties file. 
> AWS + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
> rackdc-properties file. 
> GCE + Multi-REGION + Multi-DC + GossipPropertyFileSnitch + 
> (Prefer_local=true) in rackdc-properties file. 
> GCE + Multi-REGION + Multi-DC + EC2MultiRegionSnitch + (Prefer_local=true) in 
> rackdc-properties file. 
> I am expecting with the above setup all of my nodes in a given DC all 
> communicate via private ips since the cloud providers dont charge us for 
> using the private ips and they charge for using public ips.
> But they can use PUBLIC IPs for INTER-DC communications which is working as 
> expected. 
> Here is a snippet from my log files when I ran the "nodetool repair -local" - 
> Node responding to 'node running repair' 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,628 Validator.java (line 254) 
> [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
> to /54.172.118.222 for system_traces/sessions
>  INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,741 Validator.java (line 254) 
> [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Sending completed merkle tree 
> to /54.172.118.222 for system_traces/events
> Node running repair - 
> INFO [AntiEntropyStage:1] 2014-10-08 14:47:51,927 RepairSession.java (line 
> 166) [repair #1439f290-4efa-11e4-bf3a-df845ecf54f8] Received merkle tree for 
> events from /54.172.118.222
> Note: The IPs its communicating is all PUBLIC Ips and it should have used the 
> PRIVATE IPs starting with 172.x.x.x
> YAML file values : 
> The listen address is set to: PRIVATE IP
> The broadcast address is set to: PUBLIC IP
> The SEEDs address is set to: PUBLIC IPs from both DCs
> The SNITCHES tried: GPFS and EC2MultiRegionSnitch
> RACK-DC: Had prefer_local set to true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8084) GossipFilePropertySnitch and EC2MultiRegionSnitch when used in AWS/GCE clusters doesnt use the PRIVATE IPS for Intra-DC communications - When running nodetool repair

Reply via email to