Re: [Fab-user] Trouble using fabric with EC2

2010-06-10 Thread Matt Calder
Jeff,

On Thu, Jun 10, 2010 at 6:54 PM, Jeff Forcier j...@bitprophet.org wrote:
 Hi Matt,

 Paramiko doesn't have a connection cache that I'm aware of, but Fabric
 itself does. However, from your description it sounds like you are
 creating a new instance and then connecting to it, so I'm not sure why
 a cache would present a problem.


I'm fairly certain fabric's cache is empty, because the code goes into
the network.py : connect function. The reason I suggested a paramiko
cache is that, while it is true that just after an instance goes from
pending to running there is a period when connections fail, but
that usually is very brief ( 10 sec). That is why I do a sleep(60)
after the startup, to give time for that to settle.

 If you're rebooting a remote system or doing anything to alter the
 networking of an already-connected system, then you can force a
 reconnect by manipulating fabric.state.connections. For example, see
 what the (master-only) reboot() operation does:

    
 http://code.fabfile.org/repositories/entry/fabric/master/fabric/operations.py#L668


I will look at that.

 If the problem is as straightforward as it sounds, though, I'm
 honestly not sure what's up other than possible Paramiko bug. Are
 you getting any prompts or anything when you connect to the new
 instance by hand?


I can log in by hand, completely and correctly, from a terminal. I can
do this after the instance is started but before fabric's first run
call. The funny thing is, if I do log in from a terminal, the fabric
run command will work. So, a pseudo code timeline:

# Version 1, this will fail, the run cannot connect to the instance.
startInstance()
sleep(60)
run(ls)

# Version 2, this will succeed in running ls on the instance.
startInstance()
sleep(60) # During this sleep, using a terminal, I log into the instance.
run(ls)

Another variation that works is:

# Version 3, this also succeeds.
startInstance()
sleep(60)
Debugger breakpoint here Using debugger, look at variables (no
changes), proceed
run(ls)

It is the examples that work that shout out threading error or
caching error to me.

 Another thing to try is to upgrade Paramiko to 1.7.6 if you're using
 the bundled 1.7.4.


I will try that. Thanks for taking the time to help!

Matt

 -Jeff


 On Thu, Jun 10, 2010 at 5:38 PM, Matt Calder mvcal...@gmail.com wrote:
 Bruno,

 No it is in a good group. I can log in using fabric if I restart it
 and the instance is already running. I can see that fabric is inside
 network.py trying to make the connection. I get one of two errors:
 either timeout or low level socket error. In debugging, I added
 retries to network.connect and it will fail repeatedly. First it times
 out a few times, then gives the low level socket error. While it
 doing that, I can ssh into it from a terminal. I wonder does paramiko
 have a connection cache ? Maybe it is not really retrying? Thanks for
 any help.


 Matt

 On Thu, Jun 10, 2010 at 5:23 PM, Bruno Clermont
 bruno.clerm...@gmail.com wrote:
 Is your instance in a security group that allow your IP and the port your
 trying to connect to?
 If it timeout, it's probably blocked by Amazon firewalls.

 On Thu, Jun 10, 2010 at 15:07, Matt Calder mvcal...@gmail.com wrote:

 Hi,

 I am having problems using fabric with EC2 instances. I am not
 entirely sure fabric is even the source of the problem, but I am
 hoping someone on this list can suggest a solution or a path to
 investigate. Here is the problem. I start an EC2 instance using boto.
 I wait for the instance to report its state as running. I wait an
 addition 60 seconds after that. Then I try to run things on the
 instance through fabric. At that point I get:

  [ubu...@ec2-174-129-96-241.compute-1.amazonaws.com] run: ls

 Fatal error: Timed out trying to connect to
 ec2-174-129-96-241.compute-1.amazonaws.com

 Aborting.

 Now, the interesting thing is this. During that additional 60 second
 wait I can log into the instance from a separate terminal, moreover,
 when I do that separate login, the fabric login succeeds.

 Obviously, there is not a lot to go on here, but I am not entirely
 sure what additional information would be helpful. If anyone has a
 suggestion of what I might try to do, I would greatly appreciate it.
 Thanks,

 Matt

 ___
 Fab-user mailing list
 Fab-user@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/fab-user



 ___
 Fab-user mailing list
 Fab-user@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/fab-user




 --
 Jeff Forcier
 Unix sysadmin; Python/Ruby developer
 http://bitprophet.org


___
Fab-user mailing list
Fab-user@nongnu.org
http://lists.nongnu.org/mailman/listinfo/fab-user


Re: [Fab-user] Trouble using fabric with EC2

2010-06-10 Thread Matt Calder
Bruno,

No it is in a good group. I can log in using fabric if I restart it
and the instance is already running. I can see that fabric is inside
network.py trying to make the connection. I get one of two errors:
either timeout or low level socket error. In debugging, I added
retries to network.connect and it will fail repeatedly. First it times
out a few times, then gives the low level socket error. While it
doing that, I can ssh into it from a terminal. I wonder does paramiko
have a connection cache ? Maybe it is not really retrying? Thanks for
any help.


Matt

On Thu, Jun 10, 2010 at 5:23 PM, Bruno Clermont
bruno.clerm...@gmail.com wrote:
 Is your instance in a security group that allow your IP and the port your
 trying to connect to?
 If it timeout, it's probably blocked by Amazon firewalls.

 On Thu, Jun 10, 2010 at 15:07, Matt Calder mvcal...@gmail.com wrote:

 Hi,

 I am having problems using fabric with EC2 instances. I am not
 entirely sure fabric is even the source of the problem, but I am
 hoping someone on this list can suggest a solution or a path to
 investigate. Here is the problem. I start an EC2 instance using boto.
 I wait for the instance to report its state as running. I wait an
 addition 60 seconds after that. Then I try to run things on the
 instance through fabric. At that point I get:

  [ubu...@ec2-174-129-96-241.compute-1.amazonaws.com] run: ls

 Fatal error: Timed out trying to connect to
 ec2-174-129-96-241.compute-1.amazonaws.com

 Aborting.

 Now, the interesting thing is this. During that additional 60 second
 wait I can log into the instance from a separate terminal, moreover,
 when I do that separate login, the fabric login succeeds.

 Obviously, there is not a lot to go on here, but I am not entirely
 sure what additional information would be helpful. If anyone has a
 suggestion of what I might try to do, I would greatly appreciate it.
 Thanks,

 Matt

 ___
 Fab-user mailing list
 Fab-user@nongnu.org
 http://lists.nongnu.org/mailman/listinfo/fab-user



___
Fab-user mailing list
Fab-user@nongnu.org
http://lists.nongnu.org/mailman/listinfo/fab-user


Re: [Fab-user] Trouble using fabric with EC2

2010-06-14 Thread Matt Calder
All,

After much debugging I finally found a workaround. I'd like to explain
what I did in the hopes that someone might see what the underlying
problem is.

I don't think I made this point explicit in my previous emails, but, I
am using fabric as a library. For simplicity, say I have two
functions, createInstance, and runStuff. The createInstance function
creates an ec2 instance (using boto) and waits for the instance's
state to be running. The runStuff function uses fabric to run code
on the instance. So, my program looks like:

createInstance()
runStuff()

If I run it as is, I will get connection failures, inside
fabric/network.py: connect, either a socket error or a timeout. I know
that ec2 instances can report their state as running but still not
be ready to take connections. So I added a sleep to my program,

createInstance()
sleep(240)
runStuff()

Now, four minutes may seem excessive, but, with four minutes I still
get connection errors. During my investigations, I made a few
interesting observations. If I place a debugger break point just after
the sleep. I can break, and resume and I will not get connection
errors. If during the sleep period, I ssh into the instance from a
terminal, I will not get connection errors, either in the terminal or
in the program when the sleep passes (yes, really). Lastly, if I run
just createInstance in one process, then after, run just runStuff in
another separate process, I do not get connection errors.

The workaround that I found was two part. First, I removed the
sleep(240). Instead, I placed a sleep of 20 seconds in
paramiko/client.py, at the very beginning of Client.connect. Then I
added logic to fabric/network.py connect to retry on timeouts and
socket errors up to six times. With these changes, I often connect the
first time (that would include one 20 second sleep), and if not,
always the second time (in the ten or so runs I have done).

Note that the connection errors are occurring prior to any ssh
activities, the connection is just getting a socket to port 22 on the
ec2 instance.

For the record I am running Ubuntu 10.04, however, colleagues report
the same errors on Windows and MacOS.

I hope someone can provide a reason for the behavior I have been
seeing. I don't mind the workaround, but while it works, it is not
based on any real understanding of what the problem is.

Matt





On Thu, Jun 10, 2010 at 8:57 PM, Patrick J McNerthney
pmcnerth...@clearpointmetrics.com wrote:

 Try using the --disable-known-hosts command line option to see if it has
 something to do with a prior use of the same ip address.

 On 06/10/2010 01:19 PM, Matt Calder wrote:

 Jeff,

 On Thu, Jun 10, 2010 at 6:54 PM, Jeff Forcierj...@bitprophet.org  wrote:


 Hi Matt,

 Paramiko doesn't have a connection cache that I'm aware of, but Fabric
 itself does. However, from your description it sounds like you are
 creating a new instance and then connecting to it, so I'm not sure why
 a cache would present a problem.



 I'm fairly certain fabric's cache is empty, because the code goes into
 the network.py : connect function. The reason I suggested a paramiko
 cache is that, while it is true that just after an instance goes from
 pending to running there is a period when connections fail, but
 that usually is very brief (  10 sec). That is why I do a sleep(60)
 after the startup, to give time for that to settle.



 If you're rebooting a remote system or doing anything to alter the
 networking of an already-connected system, then you can force a
 reconnect by manipulating fabric.state.connections. For example, see
 what the (master-only) reboot() operation does:


  http://code.fabfile.org/repositories/entry/fabric/master/fabric/operations.py#L668



 I will look at that.



 If the problem is as straightforward as it sounds, though, I'm
 honestly not sure what's up other than possible Paramiko bug. Are
 you getting any prompts or anything when you connect to the new
 instance by hand?



 I can log in by hand, completely and correctly, from a terminal. I can
 do this after the instance is started but before fabric's first run
 call. The funny thing is, if I do log in from a terminal, the fabric
 run command will work. So, a pseudo code timeline:

 # Version 1, this will fail, the run cannot connect to the instance.
 startInstance()
 sleep(60)
 run(ls)

 # Version 2, this will succeed in running ls on the instance.
 startInstance()
 sleep(60) # During this sleep, using a terminal, I log into the instance.
 run(ls)

 Another variation that works is:

 # Version 3, this also succeeds.
 startInstance()
 sleep(60)
 Debugger breakpoint here  Using debugger, look at variables (no
 changes), proceed
 run(ls)

 It is the examples that work that shout out threading error or
 caching error to me.



 Another thing to try is to upgrade Paramiko to 1.7.6 if you're using
 the bundled 1.7.4.



 I will try that. Thanks for taking the time to help!

 Matt



 -Jeff


 On Thu, Jun 10, 2010 at 5:38 PM, Matt

Re: [Fab-user] Trouble using fabric with EC2

2010-06-14 Thread Matt Calder
Patrick,

I thought you were on to something there, but alas no. I get the same
using both DNS and IP. Both the errors without the fixes described,
and correct connections with the fixes.

Matt

On Mon, Jun 14, 2010 at 5:25 PM, Patrick J McNerthney
pmcnerth...@clearpointmetrics.com wrote:
 Matt,

 Try eliminating the use of DNS, ie.
 ec2-174-129-96-241.compute-1.amazonaws.com, and instead connect directly
 to the IP address, ie. 174.129.96.241, to see if that has something to do
 with it.

 Pat


 On 06/14/2010 11:16 AM, Matt Calder wrote:

 All,

 After much debugging I finally found a workaround. I'd like to explain
 what I did in the hopes that someone might see what the underlying
 problem is.

 I don't think I made this point explicit in my previous emails, but, I
 am using fabric as a library. For simplicity, say I have two
 functions, createInstance, and runStuff. The createInstance function
 creates an ec2 instance (using boto) and waits for the instance's
 state to be running. The runStuff function uses fabric to run code
 on the instance. So, my program looks like:

 createInstance()
 runStuff()

 If I run it as is, I will get connection failures, inside
 fabric/network.py: connect, either a socket error or a timeout. I know
 that ec2 instances can report their state as running but still not
 be ready to take connections. So I added a sleep to my program,

 createInstance()
 sleep(240)
 runStuff()

 Now, four minutes may seem excessive, but, with four minutes I still
 get connection errors. During my investigations, I made a few
 interesting observations. If I place a debugger break point just after
 the sleep. I can break, and resume and I will not get connection
 errors. If during the sleep period, I ssh into the instance from a
 terminal, I will not get connection errors, either in the terminal or
 in the program when the sleep passes (yes, really). Lastly, if I run
 just createInstance in one process, then after, run just runStuff in
 another separate process, I do not get connection errors.

 The workaround that I found was two part. First, I removed the
 sleep(240). Instead, I placed a sleep of 20 seconds in
 paramiko/client.py, at the very beginning of Client.connect. Then I
 added logic to fabric/network.py connect to retry on timeouts and
 socket errors up to six times. With these changes, I often connect the
 first time (that would include one 20 second sleep), and if not,
 always the second time (in the ten or so runs I have done).

 Note that the connection errors are occurring prior to any ssh
 activities, the connection is just getting a socket to port 22 on the
 ec2 instance.

 For the record I am running Ubuntu 10.04, however, colleagues report
 the same errors on Windows and MacOS.

 I hope someone can provide a reason for the behavior I have been
 seeing. I don't mind the workaround, but while it works, it is not
 based on any real understanding of what the problem is.

 Matt





 On Thu, Jun 10, 2010 at 8:57 PM, Patrick J McNerthney
 pmcnerth...@clearpointmetrics.com  wrote:


 Try using the --disable-known-hosts command line option to see if it has
 something to do with a prior use of the same ip address.

 On 06/10/2010 01:19 PM, Matt Calder wrote:


 Jeff,

 On Thu, Jun 10, 2010 at 6:54 PM, Jeff Forcierj...@bitprophet.org
  wrote:



 Hi Matt,

 Paramiko doesn't have a connection cache that I'm aware of, but Fabric
 itself does. However, from your description it sounds like you are
 creating a new instance and then connecting to it, so I'm not sure why
 a cache would present a problem.




 I'm fairly certain fabric's cache is empty, because the code goes into
 the network.py : connect function. The reason I suggested a paramiko
 cache is that, while it is true that just after an instance goes from
 pending to running there is a period when connections fail, but
 that usually is very brief (    10 sec). That is why I do a sleep(60)
 after the startup, to give time for that to settle.




 If you're rebooting a remote system or doing anything to alter the
 networking of an already-connected system, then you can force a
 reconnect by manipulating fabric.state.connections. For example, see
 what the (master-only) reboot() operation does:



  http://code.fabfile.org/repositories/entry/fabric/master/fabric/operations.py#L668




 I will look at that.




 If the problem is as straightforward as it sounds, though, I'm
 honestly not sure what's up other than possible Paramiko bug. Are
 you getting any prompts or anything when you connect to the new
 instance by hand?




 I can log in by hand, completely and correctly, from a terminal. I can
 do this after the instance is started but before fabric's first run
 call. The funny thing is, if I do log in from a terminal, the fabric
 run command will work. So, a pseudo code timeline:

 # Version 1, this will fail, the run cannot connect to the instance.
 startInstance()
 sleep(60)
 run(ls)

 # Version 2, this will succeed in running