[ https://issues.apache.org/jira/browse/WHIRR-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163706#comment-13163706 ]
Andrei Savu commented on WHIRR-441: ----------------------------------- Thanks Evan for reporting this. I think you should use options like whirr.instance-templates-minimum-number-of-instances, whirr.max-startup-retries to improve your odds of starting larger size clusters. See the configuration guide: http://whirr.apache.org/docs/0.6.0/configuration-guide.html > Precondition failure: IndexOutOfBoundsException on cluster setup > ----------------------------------------------------------------- > > Key: WHIRR-441 > URL: https://issues.apache.org/jira/browse/WHIRR-441 > Project: Whirr > Issue Type: Bug > Affects Versions: 0.6.0 > Environment: 64 bit Amazon linux AMI w/ Cloudera CDH3U2 hadoop/hive > stack > Reporter: Evan Pollan > > I was spinning up a 16 node cluster this morning, and, after a series of > errors (not uncommon) there was a precondition assertion failure that left > the whirr JVM running, but dormant for about 20 minutes. I haven't seen this > before using the same cluster config and whirr version, and I'm trying again > to see if it's reproducible. > Here's the error: > Starting 15 node(s) with roles [hadoop-datanode, hadoop-tasktracker] > Starting 1 node(s) with roles [hadoop-jobtracker, hadoop-namenode] > << problem applying options to node(us-east-1/sir-b61d7212): > org.jclouds.aws.AWSResponseException: request POST > https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 400, error: > AWSError{requestId='9530b126-fae6-43c8-86a4-b7e2a865c8a1', > requestToken='null', code='InternalError', message='An internal error has > occurred', context='{Response=, Errors=}'} > at > org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:74) > at > org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:69) > at > org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.shouldContinue(BaseHttpCommandExecutorService.java:200) > at > org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:165) > at > org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > << problem applying options to node(us-east-1/sir-d4907012): > org.jclouds.aws.AWSResponseException: request POST > https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 400, error: > AWSError{requestId='69215a6b-5455-402f-ae0c-aaaca6245cb6', > requestToken='null', code='InternalError', message='An internal error has > occurred', context='{Response=, Errors=}'} > at > org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:74) > at > org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:69) > at > org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.shouldContinue(BaseHttpCommandExecutorService.java:200) > at > org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:165) > at > org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > at java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:636) > Nodes started: [[id=us-east-1/i-6dcec30e, providerId=i-6dcec30e, > group=logs-cluster, name=null, location=[id=us-east-1c, scope=ZONE, > description=us-east-1c, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, hostname=ip-10-196-130-159, > privateAddresses=[10.196.130.159], publicAddresses=[204.236.240.255], > hardware=[id=m1.xlarge, providerId=m1.xlarge, name=null, > processors=[[cores=4.0, speed=2.0]], ram=15360, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, > device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, > size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], > supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()), > tags=[]], loginUser=ubuntu, userMetadata={}, tags=[]]] > Starting 2 node(s) with roles [hadoop-datanode, hadoop-tasktracker] > Dying because - net.schmizz.sshj.transport.TransportException: Broken > transport; encountered EOF > Dying because - net.schmizz.sshj.transport.TransportException: Broken > transport; encountered EOF > <<kex done>> woke to: net.schmizz.sshj.transport.TransportException: Broken > transport; encountered EOF > << (ubuntu@50.16.99.93:22) error acquiring SSHClient(ubuntu@50.16.99.93:22): > Broken transport; encountered EOF > net.schmizz.sshj.transport.TransportException: Broken transport; encountered > EOF > at net.schmizz.sshj.transport.Reader.run(Reader.java:70) > Nodes started: [[id=us-east-1/i-cbc6cba8, providerId=i-cbc6cba8, > group=logs-cluster, name=null, location=[id=us-east-1d, scope=ZONE, > description=us-east-1d, parent=us-east-1, iso3166Codes=[US-VA], metadata={}], > uri=null, imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, > version=10.04, arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, hostname=ip-10-120-239-36, > privateAddresses=[10.120.239.36], publicAddresses=[50.16.99.93], > hardware=[id=m1.xlarge, providerId=m1.xlarge, name=null, > processors=[[cores=4.0, speed=2.0]], ram=15360, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, > device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, > size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], > supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()), > tags=[]], loginUser=ubuntu, userMetadata={}, tags=[]], > [id=us-east-1/i-cdc6cbae, providerId=i-cdc6cbae, group=logs-cluster, > name=null, location=[id=us-east-1d, scope=ZONE, description=us-east-1d, > parent=us-east-1, iso3166Codes=[US-VA], metadata={}], uri=null, > imageId=us-east-1/ami-da0cf8b3, os=[name=null, family=ubuntu, version=10.04, > arch=paravirtual, is64Bit=true, > description=ubuntu-images-us/ubuntu-lucid-10.04-amd64-server-20101020.manifest.xml], > state=RUNNING, loginPort=22, hostname=ip-10-123-69-36, > privateAddresses=[10.123.69.36], publicAddresses=[107.22.69.237], > hardware=[id=m1.xlarge, providerId=m1.xlarge, name=null, > processors=[[cores=4.0, speed=2.0]], ram=15360, volumes=[[id=null, > type=LOCAL, size=10.0, device=/dev/sda1, durable=false, isBootDevice=true], > [id=null, type=LOCAL, size=420.0, device=/dev/sdb, durable=false, > isBootDevice=false], [id=null, type=LOCAL, size=420.0, device=/dev/sdc, > durable=false, isBootDevice=false], [id=null, type=LOCAL, size=420.0, > device=/dev/sdd, durable=false, isBootDevice=false], [id=null, type=LOCAL, > size=420.0, device=/dev/sde, durable=false, isBootDevice=false]], > supportsImage=And(ALWAYS_TRUE,Or(isWindows(),requiresVirtualizationType(paravirtual)),ALWAYS_TRUE,is64Bit()), > tags=[]], loginUser=ubuntu, userMetadata={}, tags=[]]] > Deleting failed node node us-east-1/sir-d4907012 > Deleting failed node node us-east-1/sir-b61d7212 > Deleting failed node node us-east-1/i-23cec340 > Node deleted: us-east-1/sir-b61d7212 > Node deleted: us-east-1/sir-d4907012 > Node deleted: us-east-1/i-23cec340 > Exception in thread "main" java.lang.IndexOutOfBoundsException: index (0) > must be less than size (0) > at > com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:301) > at > com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:280) > at com.google.common.collect.Iterables.get(Iterables.java:649) > at > org.apache.whirr.actions.BootstrapClusterAction$1.apply(BootstrapClusterAction.java:226) > at > org.apache.whirr.actions.BootstrapClusterAction$1.apply(BootstrapClusterAction.java:223) > at com.google.common.collect.Iterators$8.next(Iterators.java:765) > at java.util.AbstractCollection.addAll(AbstractCollection.java:322) > at java.util.LinkedHashSet.<init>(LinkedHashSet.java:169) > at com.google.common.collect.Sets.newLinkedHashSet(Sets.java:264) > at > org.apache.whirr.actions.BootstrapClusterAction.getInstances(BootstrapClusterAction.java:222) > at > org.apache.whirr.actions.BootstrapClusterAction.doAction(BootstrapClusterAction.java:141) > at > org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:80) > at > org.apache.whirr.ClusterController.launchCluster(ClusterController.java:106) > at > org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:62) > at org.apache.whirr.cli.Main.run(Main.java:64) > at org.apache.whirr.cli.Main.main(Main.java:97) > Here's the config file: > ======================= > whirr.instance-templates=1 hadoop-jobtracker+hadoop-namenode,15 > hadoop-datanode+hadoop-tasktracker > whirr.hadoop.install-function=install_cdh_hadoop > whirr.hadoop.configure-function=configure_cdh_hadoop > whirr.provider=aws-ec2 > whirr.cluster-name=logs-cluster > whirr.identity=ACCESS KEY ID GOES HERE > whirr.credential=SECRET ACCESS KEY GOES HERE > whirr.private-key-file=${sys:user.home}/.ssh/id_rsa > whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pub > whirr.hardware-id=m1.xlarge > # Using 64 bit Ubuntu 10.04 to avoid defect > https://issues.apache.org/jira/browse/WHIRR-148 > whirr.image-id=us-east-1/ami-da0cf8b3 > whirr.aws-ec2-spot-price=1.00 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira