Re: Client did not send nnn bytes as expected

jchimene Fri, 05 Dec 2008 10:10:50 -0800

Hi Amit,

You don't make this easy, do you...


o     Just to be clear: goodness happens when the client sends 2 TCP
packets; which become three IP packets on the wire; which are
reassembled by the server into 2 TCP packets.
       Badness happens when the client sends 2 TCP packets; which
become three IP packets on the wire; which are reassembled into one
complete TCP packet and 1 incomplete TCP packet.
      Can you reproduce this in your lab? I'm guessing "no", otherwise
you would not have deployed the app...

o     Do you see a NAK at the client after the dropped fragment?

o     Pls. try traceroute from your lab and from the client box. What
are the differences?

o     It's now appearing to be an IP issue. The fact that the
fragmentation doesn't occur on the larger packet is interesting.

o     The two separate TCP packets leads to an assumption that you can
identify requests from the same client box at the server. IOW, you
have an
       application-level protocol that lets you reassemble the two
packets into a single request. I'm sure this is the case, but such a
design isn't explicitly stated in your
       message. Your server application never sees the 2 -> 3 split,
since the normal case is that your server app only sees 2 packets from
the client. I'm reluctant to say this, but
       part of this process may require proof that the protocol design
is resilient to network transmission errors.

o     I'd start playing around w/ different packet sizes and
transmission rates (via ping) to see if you can trip any triggers. It
may be a combination of buffering/congestion
       between the client and the server.
       Did you try ping w/ different packet sizes? I realize that you
have different servers. Does the connection between the client and
server occur over the public switched network
       or does it use a private circuit?

o     There have been posts in this thread w/r/t/ SSL and IE. Are they
relevant?

Cheers,
jec

On Dec 5, 1:21 am, Amit Kasher <[EMAIL PROTECTED]> wrote:
> Hi,
> We have spent the past 2 days working on this, and have some new
> findings.
>
> We have made contact to one of our customers who is encountering this
> issue more frequently than others, and he granted us access to his
> computer (using logmein). We installed WireShark on his computer, as
> well as on the server. We managed to reproduced the problem with both
> sniffers in action, and analyze the exact correlating TCP segments
> according to their sequence and ack numbers. Here are the results.
>
> This is what happens in the valid state:
> The client sends 2 TCP segments for a GWT service calls, which are
> supposed to be reassembled to a single PDU which is the entire single
> HTTP request. The first segment always contains the HTTP request
> header, and the second TCP segment always contains the HTTP request
> body. For instance, we see that the client sends a first segment of
> size 969 bytes, and a second segment of size 454 bytes. In the server
> we see that these 2 segments become 3 segments. The first is still 969
> bytes and contains the HTTP request header; the second is 363 bytes
> (80% of the original second segment), and the third is the remaining
> 91 bytes (20% of the original 454 bytes).
>
> In the invalid state, when the problem occurs, the third segment
> simply does not arrive in the server. It seems that something in the
> way has split the second 454 bytes segment to 2 segments, and only
> sent the first one to the server.
>
> 1. If this is something in the client's machine, how come we don't see
> it in the sniffer? (we even tried removing all firewall/antivirus
> software, reinstalling the network card driver)
> 2. If this is not something in the client's machine, how come some
> clients encounter this much more than others, that never encounter
> this?
>
> Can it be some kind of network equipment that some of our clients
> (reminder - different ISPs) go through, and others don't?
>
> Unfortunately, this new info still leaves us clueless...
>
> On Dec 3, 5:16 pm, jchimene <[EMAIL PROTECTED]> wrote:
>
> > On Dec 2, 11:20 pm, Amit Kasher <[EMAIL PROTECTED]> wrote:
>
> > > Hi and thanks again for your responses.
>
> > No Prob.
>
> > If this "opportunity for excellence" is as pervasive as you suspect,
> > installing software on a client's computer should be a non-starter
> > from the perspective that installing it on *any* computer *anywhere on
> > the planet* should reliably reproduce the issue. You say that tcpdump
> > shows the packet truncation, so I'm not sure I understand the
> > requirement to install something on a client machine. My goal in these
> > past responses has been to absolutely prove that it's the
> > serialization code (by factoring out the serialization code using
> > ping), not something peculiar to the transport or session layers.
>
> > Are you using the public switched network to provide client/server
> > connectivity? If not, nothing you've said so far would eliminate your
> > network transport service.
>
> > I find it hard to believe it's GWT, as the cargo size is so small as
> > to be insignificant, and others would have reported this issue by now.
> > I have to admit that I'm not a user of Java serialization, so there
> > may have been reports of this serialization issues of which I'm
> > blissfully unaware. From everything you're saying, it really looks
> > like the problem is in user-space. It may be a certain code path that
> > leads to the same serialization invocation logic. I'd start pulling
> > this code apart, instrumenting the hell out of it and running it
> > through JUnit or some such automated testing environment. Again, I
> > understand you've probably done this...
>
> > I'm wondering if there's a specific byte-pattern that's causing this.
> > Have you tried reordering the structure members? Also, have you
> > eliminated buffer corruption issues? Since it's cross-browser, what
> > does the -pretty flag + Firebug reveal? Esp. when profiling the code?
> > (Although I must admit that you've probably tried all that type of
> > debugging by now).
>
> > Bueno Suerte,
> > jec
>
> > > A few more subtle observations and insights:
> > > 1. It's probably not the server. There are several reasons that lead
> > > us to believe that the server is not the cause of this issue: (a) We
> > > switched hosting providers. (b) These providers reside in completely
> > > different geographical locations - countries. (c) We have always been
> > > using JBoss on CentOS, but this issue occurs both when we work with
> > > Apache as a front end using mod_jk to tomcat, as well as when
> > > eliminating this tier and having clients go directly to tomcat - using
> > > it as an HTTP server. (d) tcpdump sniffer explicitly shows that the
> > > server receives ALWAYS EXACTLY 80% of the request payload. Unless this
> > > is something even lower level in that machine (the VPS software used -
> > > virtuozzo, the network card/driver, etc.), these observations pretty
> > > much provides an alibi for the server... I think we'd better focus on
> > > other places.
> > > 2. There are indications that this is not inside the browser as well:
> > > (a) It happens in several GWT versions. (b) It happens "to" all
> > > browsers, which provides a strong clue, since this code is completely
> > > different from browser to browser - GWT uses MsXMLHTTP activeX in IE,
> > > while using completely other objects in other browsers. Since this is
> > > the underlying mechanism used to perform RPC, it seems that if it
> > > happens for more than one of them, low chances that this is the cause.
> > > Still it seems that this MUST be the GWT/client code, since these
> > > clients, to whom this issue occurs much more often, don't have
> > > problems in any other websites (we managed to talk to several of
> > > them).
> > > One thing that comes to mind is perhaps the GWT serialization code? I
> > > don't know...
>
> > > Therefore, currently, aside from the possibility that there's a bug in
> > > the GWT serialization code, there's also the possibility that it's
> > > something in the network, even though these clients are from various
> > > ISPs, and geographical locations. Yes, I notice the dead end as
> > > well...
>
> > > These observations somewhat reduce the anticipated benefit (let alone
> > > the feasibility...) of several of your (MUCH APPRECIATED, THOUGH)
> > > suggestions:
> > > 1. ping from the lab
> > > 2. perl HTTP server
>
> > > Despite that, we ARE happy about any suggestion and willing to put the
> > > required effort, so we'll try to make progress in these direction.
>
> > > Our situation now is that we assume that the data arrives corrupted to
> > > the server, and we should see how this data comes out of the client.
> > > Therefore we will also try to install a sniffer in a client computer
> > > in which this occurs (though we have been trying to do that for quite
> > > a long time now).
>
> > > On Dec 2, 10:29 pm, jchimene <[EMAIL PROTECTED]> wrote:
>
> > > > Hi Amit,
>
> > > > One other thing:
>
> > > > I'm getting the impression that you also have a custom server. If it's
> > > > an identical configuration across all server instances, than you also
> > > > have to prove that it's not the server. Again, I'd code a simple HTTP
> > > > server in Perl (because there's no problem so intractable that it
> > > > can't be made worse with a Perl application) and use it to test
> > > > against your application.
>
> > > > Cheers,
> > > > jec
>
> > > > On Dec 2, 9:11 am, Amit Kasher <[EMAIL PROTECTED]> wrote:
>
> > > > > Hi,
> > > > > Thanks for your reply. Answers are inline.
>
> > > > > On Dec 2, 5:50 pm, jchimene <[EMAIL PROTECTED]> wrote:> Hi,
>
> > > > > > A few questions:
>
> > > > > > o Are all packets sent to the server the same size?
>
> > > > > No, they are not.
>
> > > > > > o What is that size?
>
> > > > > This depends on the service call - somewhere between 150 and 2000
> > > > > bytes.
> > > > > I will mention again that by using a sniffer (tcpdump), it seems that
> > > > > EVERY time this issue occurs, the actual packets the server receives
> > > > > are ALWAYS EXACTLY 80% of what it should have received. This, again,
> > > > > was very encouraging to find as a clue, but unfortunately led me
> > > > > nowhere.
>
> > > > > > o Have you checked for other types of congestion?
>
> > > > > Congestion? Unfortunately, I don't have any control over the client's
> > > > > environment since this is an internet application and I can't
> > > > > reproduce it.
>
> > > > > > o Is this entirely TCP/IP? Have you checked maxrss?
>
> > > > > maxrss? I'm not sure I understood the relevance... TCP/IP is obviously
> > > > > used, it is the underlying protocol of HTTP...
>
> > > > > > o Have you enabled logging on intermediate nodes to see if there are
> > > > > > congestion issues?
>
> > > > > I wish I could... I don't have any control over any node before the
> > > > > server. It is a CentOS VPS hosted internet application. I will state
> > > > > that this occurred in several hosting providers, in several countries
> > > > > and geographical locations.
>
> > > > > > o Is this related to a specific time of day (although it probably
> > > > > > happens between 10:00 and 14:00...)
>
> > > > > I didn't find any correlation between the time of day and the
> > > > > occurrence of this. Obviously, this is normalized to the usage load,
> > > > > as you implied.
>
> > > > > > o Do you have a world-wide net? If so, does the problem travel 
> > > > > > across
> > > > > > time zones?
>
> > > > > My users are not from around the world, but as I stated - this issue
> > > > > occurred when using hosting providers around the world.
>
> > > > > > Cheers,
> > > > > > jec
>
> > > > > > On Dec 2, 2:13 am, Amit Kasher <[EMAIL PROTECTED]> wrote:
>
> > > > > > > Hi,
> > > > > > > Does anyone has any new insights about this issue? We've been
> > > > > > > investigating for over a year(!), and we seem to not be the only
> > > > > > > ones...
>
> > > > > > >http://tinyurl.com/5rqfp5
>
> > > > > > > Thanks.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Google Web Toolkit" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/Google-Web-Toolkit?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Client did not send nnn bytes as expected

Reply via email to