[ https://issues.apache.org/jira/browse/WHIRR-655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457038#comment-13457038 ]
Steve Loughran commented on WHIRR-655: -------------------------------------- This can be triggered by asking for every hadoop service at once; if I break it up then (somehow) it gets cached and comes in faster the next time. The console doesn't pick up the problem, it just blocks happily {code} Authorizing firewall ingress to [hdp1] on ports [8021] for [10.0.0.82/32] no answer to DNS resolution attempt for 10.0.0.82; using fallback >> running >> InitScript{INSTANCE_NAME=configure-hadoop-namenode_hadoop-jobtracker_hadoop-datanode_hadoop-tasktracker} >> on node(hdp1) 2012-09-17 15:09:32 {code} But {{whirr.log}} shows the system is in trouble. {code} 2012-09-17 15:08:07,317 DEBUG [jclouds.compute] (main) >> running [/tmp/init-configure-hadoop-namenode_hadoop-jobtracker_hadoop-datanode_hadoop-tasktracker init] as stevel@10.0.0.82 2012-09-17 15:08:07,381 DEBUG [jclouds.compute] (main) << init(0) 2012-09-17 15:08:07,381 DEBUG [jclouds.compute] (main) >> running [echo 'null'|sudo -S /tmp/init-configure-hadoop-namenode_hadoop-jobtracker_hadoop-datanode_hadoop-tasktracker start] as stevel@10.0.0.82 2012-09-17 15:08:08,466 DEBUG [jclouds.compute] (main) << start(0) 2012-09-17 15:09:05,783 ERROR [net.schmizz.sshj.transport.TransportImpl] (reader) Dying because - java.net.SocketTimeoutException: Read timed out {code} Threads {code} "sftp reader" prio=5 tid=7fd053093000 nid=0x117698000 in Object.wait() [117697000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f42b0088> (a net.schmizz.sshj.common.Buffer$PlainBuffer) at java.lang.Object.wait(Object.java:485) at net.schmizz.sshj.connection.channel.ChannelInputStream.read(ChannelInputStream.java:128) - locked <7f42b0088> (a net.schmizz.sshj.common.Buffer$PlainBuffer) at net.schmizz.sshj.sftp.PacketReader.readIntoBuffer(PacketReader.java:49) at net.schmizz.sshj.sftp.PacketReader.getPacketLength(PacketReader.java:57) at net.schmizz.sshj.sftp.PacketReader.readPacket(PacketReader.java:73) at net.schmizz.sshj.sftp.PacketReader.run(PacketReader.java:85) "reader" prio=5 tid=7fd04d859000 nid=0x117595000 runnable [117594000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at net.schmizz.sshj.transport.Reader.run(Reader.java:68) "user thread 1" prio=5 tid=7fd0530db000 nid=0x115e88000 waiting on condition [115e87000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.jclouds.predicates.RetryablePredicate.apply(RetryablePredicate.java:74) at org.jclouds.compute.callables.BlockUntilInitScriptStatusIsZeroThenReturnOutput.run(BlockUntilInitScriptStatusIsZeroThenReturnOutput.java:149) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) "com.google.inject.internal.util.$Finalizer" daemon prio=5 tid=7fd04fe41000 nid=0x116e04000 in Object.wait() [116e03000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f4a0b2a8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <7f4a0b2a8> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at com.google.inject.internal.util.$Finalizer.run(Finalizer.java:114) "Low Memory Detector" daemon prio=5 tid=7fd04d80b800 nid=0x11641f000 runnable [00000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread1" daemon prio=9 tid=7fd04d80b000 nid=0x11631c000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "C2 CompilerThread0" daemon prio=9 tid=7fd04d80a000 nid=0x116219000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=9 tid=7fd04d809800 nid=0x116116000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "Surrogate Locker Thread (Concurrent GC)" daemon prio=5 tid=7fd04d808800 nid=0x116013000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=8 tid=7fd04f941800 nid=0x115d85000 in Object.wait() [115d84000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f44e48f0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <7f44e48f0> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=7fd04f941000 nid=0x115c82000 in Object.wait() [115c81000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <7f44e5ea0> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <7f44e5ea0> (a java.lang.ref.Reference$Lock) "main" prio=5 tid=7fd050800800 nid=0x10de10000 waiting on condition [10de0f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <7f42b0210> (a com.google.common.util.concurrent.AbstractFuture$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:280) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at org.apache.whirr.actions.ByonClusterAction.doAction(ByonClusterAction.java:151) at org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:126) at org.apache.whirr.ByonClusterController.configureServices(ByonClusterController.java:99) at org.apache.whirr.ClusterController.configureServices(ClusterController.java:153) at org.apache.whirr.ClusterController.launchCluster(ClusterController.java:114) at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:69) at org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:59) at org.apache.whirr.cli.Main.run(Main.java:69) at org.apache.whirr.cli.Main.main(Main.java:102) "VM Thread" prio=9 tid=7fd04f93c800 nid=0x115b7f000 runnable "Gang worker#0 (Parallel GC Threads)" prio=9 tid=7fd04f800000 nid=0x111218000 runnable "Gang worker#1 (Parallel GC Threads)" prio=9 tid=7fd04f801000 nid=0x11131b000 runnable "Gang worker#2 (Parallel GC Threads)" prio=9 tid=7fd04f801800 nid=0x11141e000 runnable "Gang worker#3 (Parallel GC Threads)" prio=9 tid=7fd04f802000 nid=0x111521000 runnable "Gang worker#4 (Parallel GC Threads)" prio=9 tid=7fd04f802800 nid=0x111624000 runnable "Gang worker#5 (Parallel GC Threads)" prio=9 tid=7fd04f803800 nid=0x111727000 runnable "Gang worker#6 (Parallel GC Threads)" prio=9 tid=7fd04f808800 nid=0x11182a000 runnable "Gang worker#7 (Parallel GC Threads)" prio=9 tid=7fd04f809000 nid=0x11192d000 runnable "Concurrent Mark-Sweep GC Thread" prio=9 tid=7fd04f8e6800 nid=0x1157f9000 runnable "Gang worker#0 (Parallel CMS Threads)" prio=9 tid=7fd04f8e5800 nid=0x114df3000 runnable "Gang worker#1 (Parallel CMS Threads)" prio=9 tid=7fd04f8e6000 nid=0x114ef6000 runnable "VM Periodic Task Thread" prio=10 tid=7fd04d81d800 nid=0x116522000 waiting on condition "Exception Catcher Thread" prio=10 tid=7fd050801800 nid=0x10e03f000 runnable JNI global references: 1317 {code} > installations can time out on slow networks -timeout needs to be configurable > ----------------------------------------------------------------------------- > > Key: WHIRR-655 > URL: https://issues.apache.org/jira/browse/WHIRR-655 > Project: Whirr > Issue Type: Bug > Affects Versions: 0.9.0 > Environment: cable connection; first yum install of a set of > artifacts hosted on S3 > Reporter: Steve Loughran > Priority: Minor > > Downloading RPMs from a yum repo over a slow link can take so long that > scripts start to time out & your systems are left in an indeterminate state. > It ought to be possible to specify timeouts on a per-cluster basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira