[ https://issues.apache.org/jira/browse/WHIRR-459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224763#comment-13224763 ]
Paolo Castagna edited comment on WHIRR-459 at 3/7/12 10:23 PM: --------------------------------------------------------------- > Can you try to switch to UDP? ( resolver.setTCP(false) ) This didn't help. However, I think this is a client problem: 1. check your router/boardband modem 2. check your DNS configuration settings (i.e. /etc/resolv.conf and/or wherever is in Windows) 3. check your firewall configuration if you are running one 4. run FastDnsResolverTest.java to quickly check if reverse DNS queries with Whirr are working In my case it was a problem with 1. I can confirm Apache Whirr 0.7.1 works with Apache Hadoop 1.0.1 You might decide to apply the patch anyway, but that is not going to save troubles to others who, for some reasons, have no reverse DNS requests working properly. was (Author: castagna): > Can you try to switch to UDP? ( resolver.setTCP(false) ) This didn't help. However, I think this is a client problem: 1. check your router/boardband modem 2. check your DNS configuration settings (i.e. /etc/resolv.conf and/or wherever is in Windows) 3. run FastDnsResolverTest.java to quickly check if reverse DNS queries with Whirr are working > DNS Failure when trying to spawn HBase cluster > ---------------------------------------------- > > Key: WHIRR-459 > URL: https://issues.apache.org/jira/browse/WHIRR-459 > Project: Whirr > Issue Type: Bug > Affects Versions: 0.7.0 > Environment: Trying to use WHirr from behind a NAT > Reporter: Akash Ashok > Attachments: WHIRR-459.patch > > > While trying to launch an HBase cluster from a system which runs behind a NAT > I get the following Exception. The cluster is spawned and then it gets > destroyed. The same when run from another EC2 instance runs fine. > bin/whirr launch-cluster --config hbase-ec2.properties > Bootstrapping cluster > Configuring template > Configuring template > Starting 1 node(s) with roles [zookeeper, hadoop-namenode, hadoop-jobtracker, > hbase-master] > Starting 2 node(s) with roles [hadoop-datanode, hadoop-tasktracker, > hbase-regionserver] > Nodes started: [[id=us-east-1/i-5890203a, providerId=i-5890203a, group=hbase, > name=hbase-5890203a, location=[id=us-east-1c, scope=ZONE, > description=us-east-1c, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, hostname=domU-12-31-39-0F-94-D1, > privateAddresses=[10.193.151.31], publicAddresses=[204.236.208.250], > hardware=[id=c1.xlarge, providerId=c1.xlarge, name=null, > processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, > size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, > type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], > [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, > durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, > device=/dev/sde, durable=false, isBootDevice=false]], > supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()), > tags=[]], loginUser=ubuntu, userMetadata={Name=hbase-5890203a}, tags=[]]] > Nodes started: [[id=us-east-1/i-54902036, providerId=i-54902036, group=hbase, > name=hbase-54902036, location=[id=us-east-1c, scope=ZONE, > description=us-east-1c, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, hostname=ip-10-7-29-242, > privateAddresses=[10.7.29.242], publicAddresses=[75.101.240.254], > hardware=[id=c1.xlarge, providerId=c1.xlarge, name=null, > processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, > size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, > type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], > [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, > durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, > device=/dev/sde, durable=false, isBootDevice=false]], > supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()), > tags=[]], loginUser=ubuntu, userMetadata={Name=hbase-54902036}, tags=[]], > [id=us-east-1/i-5a902038, providerId=i-5a902038, group=hbase, > name=hbase-5a902038, location=[id=us-east-1c, scope=ZONE, > description=us-east-1c, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, hostname=ip-10-108-182-53, > privateAddresses=[10.108.182.53], publicAddresses=[50.16.48.211], > hardware=[id=c1.xlarge, providerId=c1.xlarge, name=null, > processors=[[cores=8.0, speed=2.5]], ram=7168, volumes=[[id=null, type=LOCAL, > size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], [id=null, > type=LOCAL, size=420.0, device=/dev/sdb, durable=false, isBootDevice=false], > [id=null, type=LOCAL, size=420.0, device=/dev/sdc, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdd, > durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, > device=/dev/sde, durable=false, isBootDevice=false]], > supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()), > tags=[]], loginUser=ubuntu, userMetadata={Name=hbase-5a902038}, tags=[]]] > Authorizing firewall ingress to [us-east-1/i-5890203a] on ports [2181] for > [122.172.0.45/32] > Unable to start the cluster. Terminating all nodes. > org.apache.whirr.net.DnsException: java.net.ConnectException: Connection > refused > at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:83) > at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:40) > at org.apache.whirr.Cluster$Instance.getPublicHostName(Cluster.java:112) > at org.apache.whirr.Cluster$Instance.getPublicAddress(Cluster.java:94) > at > org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler.doBeforeConfigure(HadoopNameNodeClusterActionHandler.java:58) > at > org.apache.whirr.service.hadoop.HadoopClusterActionHandler.beforeConfigure(HadoopClusterActionHandler.java:86) > at > org.apache.whirr.service.ClusterActionHandlerSupport.beforeAction(ClusterActionHandlerSupport.java:53) > at > org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:100) > at > org.apache.whirr.ClusterController.launchCluster(ClusterController.java:109) > at > org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:63) > at org.apache.whirr.cli.Main.run(Main.java:64) > at org.apache.whirr.cli.Main.main(Main.java:97) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) > at org.xbill.DNS.TCPClient.connect(TCPClient.java:30) > at org.xbill.DNS.TCPClient.sendrecv(TCPClient.java:118) > at org.xbill.DNS.SimpleResolver.send(SimpleResolver.java:254) > at > org.xbill.DNS.ExtendedResolver$Resolution.start(ExtendedResolver.java:95) > at org.xbill.DNS.ExtendedResolver.send(ExtendedResolver.java:358) > at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:69) > ... 11 more > Unable to load cluster state, assuming it has no running nodes. > java.io.FileNotFoundException: /home/akash/.whirr/hbase/instances (No such > file or directory) > at java.io.FileInputStream.open(Native Method) > at java.io.FileInputStream.<init>(FileInputStream.java:137) > at com.google.common.io.Files$1.getInput(Files.java:100) > at com.google.common.io.Files$1.getInput(Files.java:97) > at com.google.common.io.CharStreams$2.getInput(CharStreams.java:91) > at com.google.common.io.CharStreams$2.getInput(CharStreams.java:88) > at com.google.common.io.CharStreams.readLines(CharStreams.java:306) > at com.google.common.io.Files.readLines(Files.java:580) > at > org.apache.whirr.state.FileClusterStateStore.load(FileClusterStateStore.java:54) > at > org.apache.whirr.state.ClusterStateStore.tryLoadOrEmpty(ClusterStateStore.java:58) > at > org.apache.whirr.ClusterController.destroyCluster(ClusterController.java:143) > at > org.apache.whirr.ClusterController.launchCluster(ClusterController.java:118) > at > org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:63) > at org.apache.whirr.cli.Main.run(Main.java:64) > at org.apache.whirr.cli.Main.main(Main.java:97) > Starting to run scripts on cluster for phase destroyinstances: > Starting to run scripts on cluster for phase destroyinstances: > Finished running destroy phase scripts on all cluster instances > Destroying hbase cluster > Cluster hbase destroyed > Exception in thread "main" java.lang.RuntimeException: > org.apache.whirr.net.DnsException: java.net.ConnectException: Connection > refused > at > org.apache.whirr.ClusterController.launchCluster(ClusterController.java:125) > at > org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:63) > at org.apache.whirr.cli.Main.run(Main.java:64) > at org.apache.whirr.cli.Main.main(Main.java:97) > Caused by: org.apache.whirr.net.DnsException: java.net.ConnectException: > Connection refused > at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:83) > at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:40) > at org.apache.whirr.Cluster$Instance.getPublicHostName(Cluster.java:112) > at org.apache.whirr.Cluster$Instance.getPublicAddress(Cluster.java:94) > at > org.apache.whirr.service.hadoop.HadoopNameNodeClusterActionHandler.doBeforeConfigure(HadoopNameNodeClusterActionHandler.java:58) > at > org.apache.whirr.service.hadoop.HadoopClusterActionHandler.beforeConfigure(HadoopClusterActionHandler.java:86) > at > org.apache.whirr.service.ClusterActionHandlerSupport.beforeAction(ClusterActionHandlerSupport.java:53) > at > org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:100) > at > org.apache.whirr.ClusterController.launchCluster(ClusterController.java:109) > ... 3 more > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) > at org.xbill.DNS.TCPClient.connect(TCPClient.java:30) > at org.xbill.DNS.TCPClient.sendrecv(TCPClient.java:118) > at org.xbill.DNS.SimpleResolver.send(SimpleResolver.java:254) > at > org.xbill.DNS.ExtendedResolver$Resolution.start(ExtendedResolver.java:95) > at org.xbill.DNS.ExtendedResolver.send(ExtendedResolver.java:358) > at org.apache.whirr.net.FastDnsResolver.apply(FastDnsResolver.java:69) > ... 11 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira