Re: Can Cassandra client programs use hostnames instead of IPs?

2014-05-16 Thread Huiliang Zhang
Thanks. My case is that there is no public ip and VPN cannot be set up. It
seems that I have to run EMR job to operate on the AWS cassandra cluster.

I got some timeout errors during running the EMR job as:
java.lang.RuntimeException: Could not retrieve endpoint ranges:
at
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:333)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)
at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:144)
at
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:228)
at
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:213)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:658)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:375)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection timed out
at org.apache.thrift.transport.TSocket.open(TSocket.java:183)
at
org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.createThriftClient(BulkRecordWriter.java:348)
at
org.apache.cassandra.hadoop.BulkRecordWriter$ExternalClient.init(BulkRecordWriter.java:293)
... 12 more
Caused by: java.net.ConnectException: Connection timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.thrift.transport.TSocket.open(TSocket.java:178)
... 15 more

Appreciated if some suggestions are provided.


On Tue, May 13, 2014 at 7:45 AM, Ben Bromhead b...@instaclustr.com wrote:

 You can set listen_address in cassandra.yaml to a hostname (
 http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html
 ).

 Cassandra will use the IP address returned by a DNS query for that
 hostname. On AWS you don't have to assign an elastic IP, all instances will
 come with a public IP that lasts its lifetime (if you use ec2-classic or
 your VPC is set up to assign them).

 Note that whatever hostname you set in a nodes listen_address, it will
 need to return the private IP as AWS instances only have network access via
 there private address. Traffic to a instances public IP is NATed and
 forwarded to the private address. So you may as well just use the nodes IP
 address.

 If you run hadoop on instances in the same AWS region it will be able to
 access your Cassandra cluster via private IP. If you run hadoop externally
 just use the public IPs.

 If you run in a VPC without public addressing and want to connect from
 external hosts you will want to look at a VPN (
 http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html).

 Ben Bromhead
 Instaclustr | www.instaclustr.com | 
 @instaclustrhttp://twitter.com/instaclustr |
 +61 415 936 359




 On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote:

 Hi,

 Cassandra returns ips of the nodes in the cassandra cluster for further
 communication between hadoop program and the casandra cluster. Is there a
 way to configure the cassandra cluster to return hostnames instead of ips?
 My cassandra cluster is on AWS and has no elastic ips which can be accessed
 outside AWS.

 Thanks,
 Huiliang






Re: Can Cassandra client programs use hostnames instead of IPs?

2014-05-13 Thread Ben Bromhead
You can set listen_address in cassandra.yaml to a hostname 
(http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html).
 

Cassandra will use the IP address returned by a DNS query for that hostname. On 
AWS you don't have to assign an elastic IP, all instances will come with a 
public IP that lasts its lifetime (if you use ec2-classic or your VPC is set up 
to assign them).

Note that whatever hostname you set in a nodes listen_address, it will need to 
return the private IP as AWS instances only have network access via there 
private address. Traffic to a instances public IP is NATed and forwarded to the 
private address. So you may as well just use the nodes IP address.

If you run hadoop on instances in the same AWS region it will be able to access 
your Cassandra cluster via private IP. If you run hadoop externally just use 
the public IPs. 

If you run in a VPC without public addressing and want to connect from external 
hosts you will want to look at a VPN 
(http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_VPN.html).

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 13/05/2014, at 4:31 AM, Huiliang Zhang zhl...@gmail.com wrote:

 Hi,
 
 Cassandra returns ips of the nodes in the cassandra cluster for further 
 communication between hadoop program and the casandra cluster. Is there a way 
 to configure the cassandra cluster to return hostnames instead of ips? My 
 cassandra cluster is on AWS and has no elastic ips which can be accessed 
 outside AWS.
 
 Thanks,
 Huiliang