Hey Bob,

Thanks a lot for the response :)

After a few more hours tonight working on the problem, I've got a bit more
information to present.

>From everything I'm seeing, it looks like the issue has to do with NAT'ing
at the network level (tmobile I'd imagine).  The connection is definitely
NAT'd, the client sees itself as one outgoing IP (14.130.xxx.xxx) and port,
and the server sees an incoming connection from a different IP/port
(208.54.xxx.xxx).

My best guess is that tmobile is killing the connections at the NAT level
after not seeing traffic running on it for a certain period of time (5
minutes in this case).  This wouldn't be a problem, as you said, a reconnect
works just fine.  And in fact, the higher-level long-lived session control
is already in place, and the client reconnects/etc properly when sensing a
disconnect.

The problem comes in based on _how_ the NAT is killing the connection.
 Keeping a wake-lock on device to prevent sleeping, and watching TCPdump on
both sides shows the server receiving a RST packet, but no RST packet is
sent to the client.  The client sits there, assuming the connection is still
active, indefinitely.  The second it tries to do something (user-prompted,
or via a "ping" timer), it sends a PSH packet to the server, and the server
responds with a RST (it closed the connection when it got the RST from the
NAT).

Obviously if the NAT were to send RSTs both directions, this wouldn't be a
problem, the client would notice the disconnect, and reconnect.  But from
everything I can tell, it notifies the server, and leaves the client
completely unaware that the connection has been dropped...

I understand that the NAT needs to clear out old/stale connections, but
sending a RST uni-directionally seems a bit incorrect to me...

Any ideas?

- Dan

On Tue, Feb 2, 2010 at 10:25 PM, Bob Kerns <[email protected]> wrote:

> This is expected behavior. TCP connections time out if the connection
> is lost, or either side dies. That way, you don't have systems
> drowning in dead connections.
>
> The RST packet is telling you that the server has forgotten about the
> connection. The client may even report it directly, if it realizes
> that it hasn't heard from the server, so you may get a "connection
> reset" error even without seeing an actual RST from the server.
>
> The default timeout is usually 5 minutes, which squares with your
> observations. In general, you should not try to solve your problem by
> increasing the timeout, but rather by reestablishing the connection,
> and maintaining long-lived sessions at a higher level.
>
> I'd recommend, if possible, dropping your AlarmManager ping task, in
> favor of reopening your connection. You'll consume less resources --
> including battery. If you want to minimize the cost of reopening
> connections, you can send a "ping" whenever you happen to wake up,
> reopening if necessary. But that doesn't scale that well -- you'll be
> able to have more simultaneous clients if you strike a suitable
> balance between keeping connections alive, and the cost of reopening
> them. For rare interactions, you can support more clients if you open
> connections on actual need, and close them promptly when not needed.
>
> It all depends on exactly what you're trying to optimize, and the
> environment in which you're operating. The only constant is -- you
> can't DEPEND on keeping connections alive. View it as an optimization,
> rather than how your application works.
>
> And then make sure it is actually an optimization! So often,
> optimizations are a waste of a developer's time.
>
> I'd also recommend avoiding thinking about TCP at the level of packets
> (or segments), RST, etc., if at all possible. Unless you're trying to
> diagnose a flaky router, or issues with radio connectivity, or things
> at a similar level, it's better to focus at a higher level, at least
> at the socket level -- is it opening, established, closed, reset?
>
> On Feb 2, 1:05 am, Dan Sherman <[email protected]> wrote:
> > Hey guys, trying to track down a rather elusive problem here...
> >
> > I've been playing around with long-standing TCP connections to a server.
> >
> > The client opens a TCP connection to the server, sets a timeout at a
> > reasonably long period (30 minutes), and adds an AlarmManager task to
> "ping"
> > the server every 15 (a ping is just a junk packet the server responds to
> > with an application-level "ack").  Nothing fancy, and everything works
> > correctly on the emulator.  The client stays connected to the server for
> as
> > long as I've left it alone (a few hours easily).
> >
> > However, as soon as it runs on device, I receive some interesting
> behavior
> > when the device is sleeping (CPU completely off if I understand
> correctly).
> >
> > If I let the device connect, and go to sleep (can't be 100% certain it is
> > asleep, but I wait a good few minutes).  And have the server send an
> > un-expected packet to the client, the client most definitely wakes up,
> > processes the packet, and sends a response.  The wakeup noticibly takes a
> > few extra seconds, but this isn't an issue.
> >
> > The issue comes in if I let the device sleep for a more extended period
> of
> > time (somewhere around 5 minutes).  At this time, I see the server drop
> the
> > connection as reset, and the client sit there sleeping.  As soon as the
> > device is woken up (by my intervention), and I try to do any network
> > actions, it notices the connection isn't good anymore, and starts a
> > reconnect (hard-coded to reconnect).
> >
> > I've been running tcpdump on both the client, and the server.
> >
> > The interaction is as follows:
> > Server's point of view:
> > - Client connects (a few packets back and forth, application level, etc)
> > - 5ish minutes pass (device is sleeping)
> > - Client sends a reset packet (connection is torn down, expected)
> >
> > From the client's point of view:
> > - Connection startup (a few packets back and forth, application level,
> etc)
> > - Device goes to sleep
> >
> > The client never sees the TCP reset packet.  Once woken by something
> > external (me, the AlarmManager task, etc), the client immediately sees a
> RST
> > packet from the server, tears down the connection, and starts over.
> >
> > Anyone care to chime in with ideas as to what is happening?  My only
> > thoughts are that someone in between is killing the connection due to not
> > seeing any data send between the two after a certain amount of time,
> however
> > the time between the last packet, and the RST isn't a consistent
> period...
> >
> > This behavior is happening when running a G1 on Tmobile's 3g US network.
>  It
> > happens when the server code is running both remotely (machine in Texas),
> as
> > well as when its running on local machine (Florida).
>
> --
> You received this message because you are subscribed to the Google
> Groups "Android Developers" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]<android-developers%[email protected]>
> For more options, visit this group at
> http://groups.google.com/group/android-developers?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to