Andrew Bayer created JCLOUDS-600:
------------------------------------
Summary: runScriptOnNode's sftp put can hang
Key: JCLOUDS-600
URL: https://issues.apache.org/jira/browse/JCLOUDS-600
Project: jclouds
Issue Type: Bug
Components: jclouds-compute
Affects Versions: 1.7.3
Reporter: Andrew Bayer
So I'm still digging at this one, but wanted to open a JIRA while I was still
fresh on it. I'm seeing computeService.runScriptOnNode(...) hanging on the sftp
phase a decent chunk of the time, and it's...annoying. I can tell when this has
happened because sftp-server is still running on the VM, but the file is up and
nothing progresses from there. I'm seeing the thread that's doing the
runScriptOnNode looking like this:
{code}
Thread 6343: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
- java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
@bci=20, line=226 (Interpreted frame)
-
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(int,
long) @bci=106, line=1037 (Interpreted frame)
-
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(int,
long) @bci=25, line=1326 (Interpreted frame)
- com.google.common.util.concurrent.AbstractFuture$Sync.get(long) @bci=3,
line=268 (Interpreted frame)
- com.google.common.util.concurrent.AbstractFuture.get(long,
java.util.concurrent.TimeUnit) @bci=9, line=96 (Interpreted frame)
-
org.jclouds.compute.callables.BlockUntilInitScriptStatusIsZeroThenReturnOutput.get(long,
java.util.concurrent.TimeUnit) @bci=3, line=194 (Interpreted frame)
-
org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSshAndBlockUntilComplete.doCall()
@bci=14, line=60 (Interpreted frame)
- org.jclouds.compute.callables.RunScriptOnNodeAsInitScriptUsingSsh.call()
@bci=27, line=77 (Interpreted frame)
-
org.jclouds.compute.internal.BaseComputeService.runScriptOnNode(java.lang.String,
org.jclouds.scriptbuilder.domain.Statement,
org.jclouds.compute.options.RunScriptOptions) @bci\
=109, line=615 (Interpreted frame)
-
org.jclouds.compute.internal.BaseComputeService.runScriptOnNode(java.lang.String,
org.jclouds.scriptbuilder.domain.Statement) @bci=6, line=599 (Interpreted
frame)
- org.jclouds.compute.ComputeService$runScriptOnNode.call(java.lang.Object,
java.lang.Object, java.lang.Object) @bci=22 (Interpreted frame)
{code}
And then additional threads like this:
{code}
Thread 6418: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=502 (Interpreted frame)
- net.schmizz.sshj.connection.channel.ChannelInputStream.read(byte[], int,
int) @bci=49, line=128 (Interpreted frame)
- net.schmizz.sshj.sftp.PacketReader.readIntoBuffer(byte[], int, int) @bci=25,
line=49 (Interpreted frame)
- net.schmizz.sshj.sftp.PacketReader.getPacketLength() @bci=11, line=57
(Interpreted frame)
- net.schmizz.sshj.sftp.PacketReader.readPacket() @bci=1, line=73 (Interpreted
frame)
- net.schmizz.sshj.sftp.PacketReader.run() @bci=8, line=85 (Interpreted frame)
Thread 6417: (state = IN_NATIVE)
- java.net.SocketInputStream.socketRead0(java.io.FileDescriptor, byte[], int,
int, int) @bci=0 (Compiled frame; information may be imprecise)
- java.net.SocketInputStream.read(byte[], int, int) @bci=84, line=146
(Compiled frame)
- net.schmizz.sshj.transport.Reader.run() @bci=41, line=68 (Compiled frame)
{code}
I'm not sure exactly how these threads are related (and these particular
threads may not actually be directly related at all - there are a bunch of each
sort of thread going, since I'm working across multiple instances, but there's
one of each for each hung sftp, so far as I can tell).
I'm timing out the runScriptOnNode calls and then retrying them, and 90+% of
the time, it seems to all work fine the second run through, so I'm really not
sure what's going on. Anyone have any ideas?
--
This message was sent by Atlassian JIRA
(v6.2#6252)