[ 
https://issues.apache.org/jira/browse/CASSANDRA-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703203#comment-13703203
 ] 

Jason Brown commented on CASSANDRA-5171:
----------------------------------------

While this patch (v1 actually) was reverted in CASANDRA-5432, it wasn't 
satisfactorily answered why the patch failed to work as expected. I'm adding 
details here so we can get this ticket done right :).

First it's helpful to explore how a node can start gossip in EC2MRS with 
inter-DC (inter-region) enabled (and a Priam-type setup).

# ec2 instance is started, Priam comes up first and adds publicIP/sslPort to 
the security group's ingress privileges (so this node can accept connections on 
it's publicIP/sslPort from anywhere). 
# c* starts, and gets seed node public hostnames from Priam
# gossip to one of the seeds - the public hostname will resolve to the node's 
public IP addr.
# When OTC goes to write the first message on the seed, it gets a socket from 
OTCP.newSocket(). newSocket() calls isEncryptedChannel() to determine if we 
need to encrypt the data on the wire. As we don't know anything yet about the 
seed node (remember we havn't started gossip yet with anyone), 
isEncryptedChannel() will always return true when the following are true:
## internode_encryption != none
## we don't know the DC or RACK info for the remote node (which is the case 
when using the EC2MRS). This step is a little funky as OTCP calls the snitch 
for the seed's DC/RACK, to which EC2MRS will return UNKNOWN-DC/UNKNOWN-RACK, 
which will just happen to not match a value like "us-east-1" (the current's 
node's DC). 
# create the socket using remote node's publicIP addr on the SSL port.
# create the connection from and send messages successfully, assuming you've 
opened the SSL port for public addresses on the security group (which Priam 
handles).

Thus, if we are connecting to a node in the same EC2 region, we connect on the 
publicIP (as expected) but use the SSL port.

After we learn, via gossip, about a remote node's DC/RACK/localIP, we can 
choose to reconnect to nodes in the same region on the localIP/nonSSLPort.

The reason why Vijay's patch had problems here was because on restart, we would 
already know the DC/RACK from the previous execution of c* on this node, and 
the check in OTCP.isEncryptedChannel() returns false (do not use encryption), 
so a we choose to use the non-SSL port when creating a connection to the 
publicIP. Thus the connection creation unltimately fails because the non-SSL 
port is not opened for traffic on the security group (nor should it be).

To make this patch work then, I think getting the localIP address in the OTCP's 
ctor would work the best. Code would look something like this:

{code}
    OutboundTcpConnectionPool(InetAddress remoteEp)
    {
        EndpointState epState =  
Gossiper.instance.getEndpointStateForEndpoint(remoteEp);
        if(epState != null && 
epState.getApplicationState(ApplicationState.INTERNAL_IP) != null
            && 
epState.getApplicationState(ApplicationState.DC).equals(snitch.getDatacenter(FBUtilities.getBroadcastAddress()))
        {
            id = epState.getApplicationState(ApplicationState.INTERNAL_IP);     
        
        }
        else
        {
            id = remoteEp;
        }
        
        cmdCon = new OutboundTcpConnection(this);
        cmdCon.start();
        ackCon = new OutboundTcpConnection(this);
        ackCon.start();

        metrics = new ConnectionMetrics(id, this);
    }
{code}

Then you would connect on the localIP addr with the correct port (SSL or 
non-SSL).
                
> Save EC2Snitch topology information in system table
> ---------------------------------------------------
>
>                 Key: CASSANDRA-5171
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5171
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.1
>         Environment: EC2
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Critical
>             Fix For: 1.2.7
>
>         Attachments: 0001-CASSANDRA-5171.patch, 0001-CASSANDRA-5171-v2.patch
>
>
> EC2Snitch currently waits for the Gossip information to understand the 
> cluster information every time we restart. It will be nice to use already 
> available system table info similar to GPFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to