Re: [android-developers] Re: Suspicious TCP RST packets while device is sleeping.

2010-12-26 Thread GDroid
I've been seeing the same behavior myself.
Can someone address this please?

Thanks
Guy

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] Re: Suspicious TCP RST packets while device is sleeping.

2010-02-03 Thread Bob Kerns
Well, I don't grok NAT enough to conclude that it's wrong. But I don't
see why they'd do it -- unless they're trying to minimize traffic.
Seems kinda trivial -- and likely more than offset by the later
attempted transmit.

I'm not sure what problem you're trying to solve. It can certainly
happen that one side thinks a connection is open while the other
thinks it's closed. The recipient sends a RST, the sender gets a
connection reset and life goes on.

Is it the delay in discovering the disconnect that's the issue?

On Feb 2, 7:43 pm, Dan Sherman impact...@gmail.com wrote:
 Hey Bob,

 Thanks a lot for the response :)

 After a few more hours tonight working on the problem, I've got a bit more
 information to present.

 From everything I'm seeing, it looks like the issue has to do with NAT'ing
 at the network level (tmobile I'd imagine).  The connection is definitely
 NAT'd, the client sees itself as one outgoing IP (14.130.xxx.xxx) and port,
 and the server sees an incoming connection from a different IP/port
 (208.54.xxx.xxx).

 My best guess is that tmobile is killing the connections at the NAT level
 after not seeing traffic running on it for a certain period of time (5
 minutes in this case).  This wouldn't be a problem, as you said, a reconnect
 works just fine.  And in fact, the higher-level long-lived session control
 is already in place, and the client reconnects/etc properly when sensing a
 disconnect.

 The problem comes in based on _how_ the NAT is killing the connection.
  Keeping a wake-lock on device to prevent sleeping, and watching TCPdump on
 both sides shows the server receiving a RST packet, but no RST packet is
 sent to the client.  The client sits there, assuming the connection is still
 active, indefinitely.  The second it tries to do something (user-prompted,
 or via a ping timer), it sends a PSH packet to the server, and the server
 responds with a RST (it closed the connection when it got the RST from the
 NAT).

 Obviously if the NAT were to send RSTs both directions, this wouldn't be a
 problem, the client would notice the disconnect, and reconnect.  But from
 everything I can tell, it notifies the server, and leaves the client
 completely unaware that the connection has been dropped...

 I understand that the NAT needs to clear out old/stale connections, but
 sending a RST uni-directionally seems a bit incorrect to me...

 Any ideas?

 - Dan



 On Tue, Feb 2, 2010 at 10:25 PM, Bob Kerns r...@acm.org wrote:
  This is expected behavior. TCP connections time out if the connection
  is lost, or either side dies. That way, you don't have systems
  drowning in dead connections.

  The RST packet is telling you that the server has forgotten about the
  connection. The client may even report it directly, if it realizes
  that it hasn't heard from the server, so you may get a connection
  reset error even without seeing an actual RST from the server.

  The default timeout is usually 5 minutes, which squares with your
  observations. In general, you should not try to solve your problem by
  increasing the timeout, but rather by reestablishing the connection,
  and maintaining long-lived sessions at a higher level.

  I'd recommend, if possible, dropping your AlarmManager ping task, in
  favor of reopening your connection. You'll consume less resources --
  including battery. If you want to minimize the cost of reopening
  connections, you can send a ping whenever you happen to wake up,
  reopening if necessary. But that doesn't scale that well -- you'll be
  able to have more simultaneous clients if you strike a suitable
  balance between keeping connections alive, and the cost of reopening
  them. For rare interactions, you can support more clients if you open
  connections on actual need, and close them promptly when not needed.

  It all depends on exactly what you're trying to optimize, and the
  environment in which you're operating. The only constant is -- you
  can't DEPEND on keeping connections alive. View it as an optimization,
  rather than how your application works.

  And then make sure it is actually an optimization! So often,
  optimizations are a waste of a developer's time.

  I'd also recommend avoiding thinking about TCP at the level of packets
  (or segments), RST, etc., if at all possible. Unless you're trying to
  diagnose a flaky router, or issues with radio connectivity, or things
  at a similar level, it's better to focus at a higher level, at least
  at the socket level -- is it opening, established, closed, reset?

  On Feb 2, 1:05 am, Dan Sherman impact...@gmail.com wrote:
   Hey guys, trying to track down a rather elusive problem here...

   I've been playing around with long-standing TCP connections to a server.

   The client opens a TCP connection to the server, sets a timeout at a
   reasonably long period (30 minutes), and adds an AlarmManager task to
  ping
   the server every 15 (a ping is just a junk packet the server responds to
   with an 

Re: [android-developers] Re: Suspicious TCP RST packets while device is sleeping.

2010-02-03 Thread Dan Sherman
 Is it the delay in discovering the disconnect that's the issue?

Exactly...

The connection stays open to accept data from the server.  There are
definitely points in time when this wouldn't happen for a few minutes, and
if the connection dropped, that wouldn't be a problem if the client noticed
the disconnect immediately (it would just reconnect, and start waiting
again).  However, when the device sleeps, it doesn't see the disconnect
until it wakes up (possibly hours later)...

After a bit more research, it looks like if the client holds a wake lock
infinitely (just for testing), it gets the reset packet immediately when the
connection is killed, and re-connects immediately.  However, if the device
doesn't hold the lock, and goes to sleep, the reset packet is dropped
somewhere...

Anyone on the dev team able to explain that functionality
(intended/unintended/workaround?)

- Dan

On Wed, Feb 3, 2010 at 3:30 AM, Bob Kerns r...@acm.org wrote:

 Well, I don't grok NAT enough to conclude that it's wrong. But I don't
 see why they'd do it -- unless they're trying to minimize traffic.
 Seems kinda trivial -- and likely more than offset by the later
 attempted transmit.

 I'm not sure what problem you're trying to solve. It can certainly
 happen that one side thinks a connection is open while the other
 thinks it's closed. The recipient sends a RST, the sender gets a
 connection reset and life goes on.

 Is it the delay in discovering the disconnect that's the issue?

 On Feb 2, 7:43 pm, Dan Sherman impact...@gmail.com wrote:
  Hey Bob,
 
  Thanks a lot for the response :)
 
  After a few more hours tonight working on the problem, I've got a bit
 more
  information to present.
 
  From everything I'm seeing, it looks like the issue has to do with
 NAT'ing
  at the network level (tmobile I'd imagine).  The connection is definitely
  NAT'd, the client sees itself as one outgoing IP (14.130.xxx.xxx) and
 port,
  and the server sees an incoming connection from a different IP/port
  (208.54.xxx.xxx).
 
  My best guess is that tmobile is killing the connections at the NAT level
  after not seeing traffic running on it for a certain period of time (5
  minutes in this case).  This wouldn't be a problem, as you said, a
 reconnect
  works just fine.  And in fact, the higher-level long-lived session
 control
  is already in place, and the client reconnects/etc properly when sensing
 a
  disconnect.
 
  The problem comes in based on _how_ the NAT is killing the connection.
   Keeping a wake-lock on device to prevent sleeping, and watching TCPdump
 on
  both sides shows the server receiving a RST packet, but no RST packet is
  sent to the client.  The client sits there, assuming the connection is
 still
  active, indefinitely.  The second it tries to do something
 (user-prompted,
  or via a ping timer), it sends a PSH packet to the server, and the
 server
  responds with a RST (it closed the connection when it got the RST from
 the
  NAT).
 
  Obviously if the NAT were to send RSTs both directions, this wouldn't be
 a
  problem, the client would notice the disconnect, and reconnect.  But from
  everything I can tell, it notifies the server, and leaves the client
  completely unaware that the connection has been dropped...
 
  I understand that the NAT needs to clear out old/stale connections, but
  sending a RST uni-directionally seems a bit incorrect to me...
 
  Any ideas?
 
  - Dan
 
 
 
  On Tue, Feb 2, 2010 at 10:25 PM, Bob Kerns r...@acm.org wrote:
   This is expected behavior. TCP connections time out if the connection
   is lost, or either side dies. That way, you don't have systems
   drowning in dead connections.
 
   The RST packet is telling you that the server has forgotten about the
   connection. The client may even report it directly, if it realizes
   that it hasn't heard from the server, so you may get a connection
   reset error even without seeing an actual RST from the server.
 
   The default timeout is usually 5 minutes, which squares with your
   observations. In general, you should not try to solve your problem by
   increasing the timeout, but rather by reestablishing the connection,
   and maintaining long-lived sessions at a higher level.
 
   I'd recommend, if possible, dropping your AlarmManager ping task, in
   favor of reopening your connection. You'll consume less resources --
   including battery. If you want to minimize the cost of reopening
   connections, you can send a ping whenever you happen to wake up,
   reopening if necessary. But that doesn't scale that well -- you'll be
   able to have more simultaneous clients if you strike a suitable
   balance between keeping connections alive, and the cost of reopening
   them. For rare interactions, you can support more clients if you open
   connections on actual need, and close them promptly when not needed.
 
   It all depends on exactly what you're trying to optimize, and the
   environment in which you're operating. The only constant 

[android-developers] Re: Suspicious TCP RST packets while device is sleeping.

2010-02-02 Thread Bob Kerns
This is expected behavior. TCP connections time out if the connection
is lost, or either side dies. That way, you don't have systems
drowning in dead connections.

The RST packet is telling you that the server has forgotten about the
connection. The client may even report it directly, if it realizes
that it hasn't heard from the server, so you may get a connection
reset error even without seeing an actual RST from the server.

The default timeout is usually 5 minutes, which squares with your
observations. In general, you should not try to solve your problem by
increasing the timeout, but rather by reestablishing the connection,
and maintaining long-lived sessions at a higher level.

I'd recommend, if possible, dropping your AlarmManager ping task, in
favor of reopening your connection. You'll consume less resources --
including battery. If you want to minimize the cost of reopening
connections, you can send a ping whenever you happen to wake up,
reopening if necessary. But that doesn't scale that well -- you'll be
able to have more simultaneous clients if you strike a suitable
balance between keeping connections alive, and the cost of reopening
them. For rare interactions, you can support more clients if you open
connections on actual need, and close them promptly when not needed.

It all depends on exactly what you're trying to optimize, and the
environment in which you're operating. The only constant is -- you
can't DEPEND on keeping connections alive. View it as an optimization,
rather than how your application works.

And then make sure it is actually an optimization! So often,
optimizations are a waste of a developer's time.

I'd also recommend avoiding thinking about TCP at the level of packets
(or segments), RST, etc., if at all possible. Unless you're trying to
diagnose a flaky router, or issues with radio connectivity, or things
at a similar level, it's better to focus at a higher level, at least
at the socket level -- is it opening, established, closed, reset?

On Feb 2, 1:05 am, Dan Sherman impact...@gmail.com wrote:
 Hey guys, trying to track down a rather elusive problem here...

 I've been playing around with long-standing TCP connections to a server.

 The client opens a TCP connection to the server, sets a timeout at a
 reasonably long period (30 minutes), and adds an AlarmManager task to ping
 the server every 15 (a ping is just a junk packet the server responds to
 with an application-level ack).  Nothing fancy, and everything works
 correctly on the emulator.  The client stays connected to the server for as
 long as I've left it alone (a few hours easily).

 However, as soon as it runs on device, I receive some interesting behavior
 when the device is sleeping (CPU completely off if I understand correctly).

 If I let the device connect, and go to sleep (can't be 100% certain it is
 asleep, but I wait a good few minutes).  And have the server send an
 un-expected packet to the client, the client most definitely wakes up,
 processes the packet, and sends a response.  The wakeup noticibly takes a
 few extra seconds, but this isn't an issue.

 The issue comes in if I let the device sleep for a more extended period of
 time (somewhere around 5 minutes).  At this time, I see the server drop the
 connection as reset, and the client sit there sleeping.  As soon as the
 device is woken up (by my intervention), and I try to do any network
 actions, it notices the connection isn't good anymore, and starts a
 reconnect (hard-coded to reconnect).

 I've been running tcpdump on both the client, and the server.

 The interaction is as follows:
 Server's point of view:
 - Client connects (a few packets back and forth, application level, etc)
 - 5ish minutes pass (device is sleeping)
 - Client sends a reset packet (connection is torn down, expected)

 From the client's point of view:
 - Connection startup (a few packets back and forth, application level, etc)
 - Device goes to sleep

 The client never sees the TCP reset packet.  Once woken by something
 external (me, the AlarmManager task, etc), the client immediately sees a RST
 packet from the server, tears down the connection, and starts over.

 Anyone care to chime in with ideas as to what is happening?  My only
 thoughts are that someone in between is killing the connection due to not
 seeing any data send between the two after a certain amount of time, however
 the time between the last packet, and the RST isn't a consistent period...

 This behavior is happening when running a G1 on Tmobile's 3g US network.  It
 happens when the server code is running both remotely (machine in Texas), as
 well as when its running on local machine (Florida).

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at

Re: [android-developers] Re: Suspicious TCP RST packets while device is sleeping.

2010-02-02 Thread Dan Sherman
Hey Bob,

Thanks a lot for the response :)

After a few more hours tonight working on the problem, I've got a bit more
information to present.

From everything I'm seeing, it looks like the issue has to do with NAT'ing
at the network level (tmobile I'd imagine).  The connection is definitely
NAT'd, the client sees itself as one outgoing IP (14.130.xxx.xxx) and port,
and the server sees an incoming connection from a different IP/port
(208.54.xxx.xxx).

My best guess is that tmobile is killing the connections at the NAT level
after not seeing traffic running on it for a certain period of time (5
minutes in this case).  This wouldn't be a problem, as you said, a reconnect
works just fine.  And in fact, the higher-level long-lived session control
is already in place, and the client reconnects/etc properly when sensing a
disconnect.

The problem comes in based on _how_ the NAT is killing the connection.
 Keeping a wake-lock on device to prevent sleeping, and watching TCPdump on
both sides shows the server receiving a RST packet, but no RST packet is
sent to the client.  The client sits there, assuming the connection is still
active, indefinitely.  The second it tries to do something (user-prompted,
or via a ping timer), it sends a PSH packet to the server, and the server
responds with a RST (it closed the connection when it got the RST from the
NAT).

Obviously if the NAT were to send RSTs both directions, this wouldn't be a
problem, the client would notice the disconnect, and reconnect.  But from
everything I can tell, it notifies the server, and leaves the client
completely unaware that the connection has been dropped...

I understand that the NAT needs to clear out old/stale connections, but
sending a RST uni-directionally seems a bit incorrect to me...

Any ideas?

- Dan

On Tue, Feb 2, 2010 at 10:25 PM, Bob Kerns r...@acm.org wrote:

 This is expected behavior. TCP connections time out if the connection
 is lost, or either side dies. That way, you don't have systems
 drowning in dead connections.

 The RST packet is telling you that the server has forgotten about the
 connection. The client may even report it directly, if it realizes
 that it hasn't heard from the server, so you may get a connection
 reset error even without seeing an actual RST from the server.

 The default timeout is usually 5 minutes, which squares with your
 observations. In general, you should not try to solve your problem by
 increasing the timeout, but rather by reestablishing the connection,
 and maintaining long-lived sessions at a higher level.

 I'd recommend, if possible, dropping your AlarmManager ping task, in
 favor of reopening your connection. You'll consume less resources --
 including battery. If you want to minimize the cost of reopening
 connections, you can send a ping whenever you happen to wake up,
 reopening if necessary. But that doesn't scale that well -- you'll be
 able to have more simultaneous clients if you strike a suitable
 balance between keeping connections alive, and the cost of reopening
 them. For rare interactions, you can support more clients if you open
 connections on actual need, and close them promptly when not needed.

 It all depends on exactly what you're trying to optimize, and the
 environment in which you're operating. The only constant is -- you
 can't DEPEND on keeping connections alive. View it as an optimization,
 rather than how your application works.

 And then make sure it is actually an optimization! So often,
 optimizations are a waste of a developer's time.

 I'd also recommend avoiding thinking about TCP at the level of packets
 (or segments), RST, etc., if at all possible. Unless you're trying to
 diagnose a flaky router, or issues with radio connectivity, or things
 at a similar level, it's better to focus at a higher level, at least
 at the socket level -- is it opening, established, closed, reset?

 On Feb 2, 1:05 am, Dan Sherman impact...@gmail.com wrote:
  Hey guys, trying to track down a rather elusive problem here...
 
  I've been playing around with long-standing TCP connections to a server.
 
  The client opens a TCP connection to the server, sets a timeout at a
  reasonably long period (30 minutes), and adds an AlarmManager task to
 ping
  the server every 15 (a ping is just a junk packet the server responds to
  with an application-level ack).  Nothing fancy, and everything works
  correctly on the emulator.  The client stays connected to the server for
 as
  long as I've left it alone (a few hours easily).
 
  However, as soon as it runs on device, I receive some interesting
 behavior
  when the device is sleeping (CPU completely off if I understand
 correctly).
 
  If I let the device connect, and go to sleep (can't be 100% certain it is
  asleep, but I wait a good few minutes).  And have the server send an
  un-expected packet to the client, the client most definitely wakes up,
  processes the packet, and sends a response.  The wakeup noticibly takes a
  few