[ 
https://issues.apache.org/jira/browse/CASSANDRA-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13624375#comment-13624375
 ] 

Arya Goudarzi edited comment on CASSANDRA-5432 at 4/6/13 9:15 AM:
------------------------------------------------------------------

OK, I found the problem, but something is changed in this release regarding the 
networking that is not clear to me. I use EC2. I had to open all TCP ports to 
the world for the repairs to work. They didn't even work when I allowed all TCP 
within our C*'s security group. This is not acceptable. What was changed in 
1.2.3 in terms of repair routing? Shouldn't it just use the storage port?

We use Ec2MultiRegionSnitch, so it returns DNS that resolved to local ips for 
in-region communication and public ips for cross-region communication. I have a 
C* 1.1.10 cluster in production and it is working fine without having to open 
the security group wide open. 

Please advice.
                
      was (Author: arya):
    OK, I found the problem, but something is changed in this release regarding 
the networking that is not clear to me. I use EC2. I had to open all TCP ports 
to the world for the repairs to work. They didn't even work when I allowed all 
TCP within out C*'s security group. This is not acceptable. What was changed in 
1.2.3 in terms of repair routing? Shouldn't it just use the storage port?

We use Ec2MultiRegionSnitch, so it returns DNS that resolved to local ips for 
in-region communication and public ips for cross-region communication. I have a 
C* 1.1.10 cluster in production and it is working fine without having to open 
the security group wide open. 

Please advice.
                  
> AntiEntropy Repair Freezing on 1.2.3
> ------------------------------------
>
>                 Key: CASSANDRA-5432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5432
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.3
>         Environment: Ubuntu 10.04.1 LTS
> C* 1.2.3
> Sun Java 6 u43
> JNA Enabled
> Not using VNodes
>            Reporter: Arya Goudarzi
>            Priority: Critical
>
> Since I have upgraded our sandbox cluster, I am unable to run repair on any 
> node and I am reaching our gc_grace seconds this weekend. Please help. So 
> far, I have tried the following suggestions:
> - nodetool scrub
> - offline scrub
> - running repair on each CF separately. Didn't matter. All got stuck the same 
> way.
> The repair command just gets stuck and the machine is idling. Only the 
> following logs are printed for repair job:
>  INFO [Thread-42214] 2013-04-05 23:30:27,785 StorageService.java (line 2379) 
> Starting repair command #4, repairing 1 ranges for keyspace 
> cardspring_production
>  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,789 AntiEntropyService.java 
> (line 652) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] new session: will 
> sync /X.X.X.190, /X.X.X.43, /X.X.X.56 on range 
> (1808575600,42535295865117307932921825930779602032] for 
> keyspace_production.[comma separated list of CFs]
>  INFO [AntiEntropySessions:7] 2013-04-05 23:30:27,790 AntiEntropyService.java 
> (line 858) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] requesting merkle 
> trees for BusinessConnectionIndicesEntries (to [/X.X.X.43, /X.X.X.56, 
> /X.X.X.190])
>  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,086 AntiEntropyService.java 
> (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
> tree for ColumnFamilyName from /X.X.X.43
>  INFO [AntiEntropyStage:1] 2013-04-05 23:30:28,147 AntiEntropyService.java 
> (line 214) [repair #cc5a9aa0-9e48-11e2-98ba-11bde7670242] Received merkle 
> tree for ColumnFamilyName from /X.X.X.56
> Please advise. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to