> Is it the delay in discovering the disconnect that's the issue?

Exactly...

The connection stays open to accept data from the server.  There are
definitely points in time when this wouldn't happen for a few minutes, and
if the connection dropped, that wouldn't be a problem if the client noticed
the disconnect immediately (it would just reconnect, and start waiting
again).  However, when the device sleeps, it doesn't see the disconnect
until it wakes up (possibly hours later)...

After a bit more research, it looks like if the client holds a wake lock
infinitely (just for testing), it gets the reset packet immediately when the
connection is killed, and re-connects immediately.  However, if the device
doesn't hold the lock, and goes to sleep, the reset packet is dropped
somewhere...

Anyone on the dev team able to explain that functionality
(intended/unintended/workaround?)

- Dan

On Wed, Feb 3, 2010 at 3:30 AM, Bob Kerns <r...@acm.org> wrote:

> Well, I don't grok NAT enough to conclude that it's wrong. But I don't
> see why they'd do it -- unless they're trying to minimize traffic.
> Seems kinda trivial -- and likely more than offset by the later
> attempted transmit.
>
> I'm not sure what problem you're trying to solve. It can certainly
> happen that one side thinks a connection is open while the other
> thinks it's closed. The recipient sends a RST, the sender gets a
> "connection reset" and life goes on.
>
> Is it the delay in discovering the disconnect that's the issue?
>
> On Feb 2, 7:43 pm, Dan Sherman <impact...@gmail.com> wrote:
> > Hey Bob,
> >
> > Thanks a lot for the response :)
> >
> > After a few more hours tonight working on the problem, I've got a bit
> more
> > information to present.
> >
> > From everything I'm seeing, it looks like the issue has to do with
> NAT'ing
> > at the network level (tmobile I'd imagine).  The connection is definitely
> > NAT'd, the client sees itself as one outgoing IP (14.130.xxx.xxx) and
> port,
> > and the server sees an incoming connection from a different IP/port
> > (208.54.xxx.xxx).
> >
> > My best guess is that tmobile is killing the connections at the NAT level
> > after not seeing traffic running on it for a certain period of time (5
> > minutes in this case).  This wouldn't be a problem, as you said, a
> reconnect
> > works just fine.  And in fact, the higher-level long-lived session
> control
> > is already in place, and the client reconnects/etc properly when sensing
> a
> > disconnect.
> >
> > The problem comes in based on _how_ the NAT is killing the connection.
> >  Keeping a wake-lock on device to prevent sleeping, and watching TCPdump
> on
> > both sides shows the server receiving a RST packet, but no RST packet is
> > sent to the client.  The client sits there, assuming the connection is
> still
> > active, indefinitely.  The second it tries to do something
> (user-prompted,
> > or via a "ping" timer), it sends a PSH packet to the server, and the
> server
> > responds with a RST (it closed the connection when it got the RST from
> the
> > NAT).
> >
> > Obviously if the NAT were to send RSTs both directions, this wouldn't be
> a
> > problem, the client would notice the disconnect, and reconnect.  But from
> > everything I can tell, it notifies the server, and leaves the client
> > completely unaware that the connection has been dropped...
> >
> > I understand that the NAT needs to clear out old/stale connections, but
> > sending a RST uni-directionally seems a bit incorrect to me...
> >
> > Any ideas?
> >
> > - Dan
> >
> >
> >
> > On Tue, Feb 2, 2010 at 10:25 PM, Bob Kerns <r...@acm.org> wrote:
> > > This is expected behavior. TCP connections time out if the connection
> > > is lost, or either side dies. That way, you don't have systems
> > > drowning in dead connections.
> >
> > > The RST packet is telling you that the server has forgotten about the
> > > connection. The client may even report it directly, if it realizes
> > > that it hasn't heard from the server, so you may get a "connection
> > > reset" error even without seeing an actual RST from the server.
> >
> > > The default timeout is usually 5 minutes, which squares with your
> > > observations. In general, you should not try to solve your problem by
> > > increasing the timeout, but rather by reestablishing the connection,
> > > and maintaining long-lived sessions at a higher level.
> >
> > > I'd recommend, if possible, dropping your AlarmManager ping task, in
> > > favor of reopening your connection. You'll consume less resources --
> > > including battery. If you want to minimize the cost of reopening
> > > connections, you can send a "ping" whenever you happen to wake up,
> > > reopening if necessary. But that doesn't scale that well -- you'll be
> > > able to have more simultaneous clients if you strike a suitable
> > > balance between keeping connections alive, and the cost of reopening
> > > them. For rare interactions, you can support more clients if you open
> > > connections on actual need, and close them promptly when not needed.
> >
> > > It all depends on exactly what you're trying to optimize, and the
> > > environment in which you're operating. The only constant is -- you
> > > can't DEPEND on keeping connections alive. View it as an optimization,
> > > rather than how your application works.
> >
> > > And then make sure it is actually an optimization! So often,
> > > optimizations are a waste of a developer's time.
> >
> > > I'd also recommend avoiding thinking about TCP at the level of packets
> > > (or segments), RST, etc., if at all possible. Unless you're trying to
> > > diagnose a flaky router, or issues with radio connectivity, or things
> > > at a similar level, it's better to focus at a higher level, at least
> > > at the socket level -- is it opening, established, closed, reset?
> >
> > > On Feb 2, 1:05 am, Dan Sherman <impact...@gmail.com> wrote:
> > > > Hey guys, trying to track down a rather elusive problem here...
> >
> > > > I've been playing around with long-standing TCP connections to a
> server.
> >
> > > > The client opens a TCP connection to the server, sets a timeout at a
> > > > reasonably long period (30 minutes), and adds an AlarmManager task to
> > > "ping"
> > > > the server every 15 (a ping is just a junk packet the server responds
> to
> > > > with an application-level "ack").  Nothing fancy, and everything
> works
> > > > correctly on the emulator.  The client stays connected to the server
> for
> > > as
> > > > long as I've left it alone (a few hours easily).
> >
> > > > However, as soon as it runs on device, I receive some interesting
> > > behavior
> > > > when the device is sleeping (CPU completely off if I understand
> > > correctly).
> >
> > > > If I let the device connect, and go to sleep (can't be 100% certain
> it is
> > > > asleep, but I wait a good few minutes).  And have the server send an
> > > > un-expected packet to the client, the client most definitely wakes
> up,
> > > > processes the packet, and sends a response.  The wakeup noticibly
> takes a
> > > > few extra seconds, but this isn't an issue.
> >
> > > > The issue comes in if I let the device sleep for a more extended
> period
> > > of
> > > > time (somewhere around 5 minutes).  At this time, I see the server
> drop
> > > the
> > > > connection as reset, and the client sit there sleeping.  As soon as
> the
> > > > device is woken up (by my intervention), and I try to do any network
> > > > actions, it notices the connection isn't good anymore, and starts a
> > > > reconnect (hard-coded to reconnect).
> >
> > > > I've been running tcpdump on both the client, and the server.
> >
> > > > The interaction is as follows:
> > > > Server's point of view:
> > > > - Client connects (a few packets back and forth, application level,
> etc)
> > > > - 5ish minutes pass (device is sleeping)
> > > > - Client sends a reset packet (connection is torn down, expected)
> >
> > > > From the client's point of view:
> > > > - Connection startup (a few packets back and forth, application
> level,
> > > etc)
> > > > - Device goes to sleep
> >
> > > > The client never sees the TCP reset packet.  Once woken by something
> > > > external (me, the AlarmManager task, etc), the client immediately
> sees a
> > > RST
> > > > packet from the server, tears down the connection, and starts over.
> >
> > > > Anyone care to chime in with ideas as to what is happening?  My only
> > > > thoughts are that someone in between is killing the connection due to
> not
> > > > seeing any data send between the two after a certain amount of time,
> > > however
> > > > the time between the last packet, and the RST isn't a consistent
> > > period...
> >
> > > > This behavior is happening when running a G1 on Tmobile's 3g US
> network.
> > >  It
> > > > happens when the server code is running both remotely (machine in
> Texas),
> > > as
> > > > well as when its running on local machine (Florida).
> >
> > > --
> > > You received this message because you are subscribed to the Google
> > > Groups "Android Developers" group.
> > > To post to this group, send email to
> android-developers@googlegroups.com
> > > To unsubscribe from this group, send email to
> > > android-developers+unsubscr...@googlegroups.com<android-developers%2bunsubscr...@googlegroups.com><android-developers%2Bunsubs
> cr...@googlegroups.com>
> > > For more options, visit this group at
> > >http://groups.google.com/group/android-developers?hl=en
>
> --
> You received this message because you are subscribed to the Google
> Groups "Android Developers" group.
> To post to this group, send email to android-developers@googlegroups.com
> To unsubscribe from this group, send email to
> android-developers+unsubscr...@googlegroups.com<android-developers%2bunsubscr...@googlegroups.com>
> For more options, visit this group at
> http://groups.google.com/group/android-developers?hl=en
>

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to