Well, I did a bunch of testing on node 0.6, and after running for hours, I received no http parsing errors, no JSON parsing errors on the data returned, and no unexpected socket disconnects. I generally get one or two of these within 5-10 minutes on node 0.8. So, there's definitely something different node 0.8 is doing that is affecting this, though the tcpdump definitely indicates the problem is on the other end, so there's basically nothing more to try, but it is still worrisome :(. I think we'll have to stick on node 0.6 at least in production for a while yet.
For posterity, the thread I started with Amazon is here (no responses yet): https://forums.aws.amazon.com/thread.jspa?threadID=105268 Jimb Esser On Tuesday, September 25, 2012 5:32:47 AM UTC-7, Ben Noordhuis wrote: > > On Tue, Sep 25, 2012 at 6:38 AM, Jimb Esser <[email protected]<javascript:>> > wrote: > > Yes, but, as I understand, this is logging during processing a single > stream > > of data, in a single call to http_parser_execute(), specifically inside > of a > > for loop looping over exactly the data I logged. That should be > internally > > consistent, at least, right? If the stream says it has an 8192 byte > chunk > > coming, and there's more than that many bytes of date, 8192 bytes later > > there should be another chunk size in the stream (which, there is not, > which > > of course is why it's throwing the HPE_INVALID_CHUNK_SIZE error as it > reads > > some non-hex characters at the offset its expecting the next chunk > size). > > > > Anyway, I managed to capture it as it went bad with tcpdump (I added -U > -B > > 1000000, not sure if that helped or the timing was just different). > Short > > result is: it looks like the data was bad as it came in over TCP! I'll > go > > into detail and post the dumps below, but this quickly leads me to ask > > (after apologizing about thinking this was a node bug): why does this > > generally fail on 0.8 and never on 0.6? > > > > My guess is that on 0.6 an HTTP parse error simply disconnected the > socket, > > and all of our AWS APIs simply retry any requests which get a socket > > disconnect (as those happen rather frequently with AWS). We also > > automatically retry on a bunch of 5xx http response codes, so if any of > > those were being generated, we'd retry on those as well. I'll add some > > additional logging and run this on node 0.6 again to see if I can verify > > this. > > > > Back to the details on the symptom at hand (just in case I > misinterpreted > > some of this data, it might be good to get another set of eyes on it). > In > > this case, it didn't actually get a http parser error, it simply handed > back > > what looks like bad data (includes the HTTP chunk size lines inserted in > the > > middle of it), perhaps because the socket got closed or something > (although > > it seems if it got closed prematurely that perhaps that should have > > generated a parsing error since it was expecting more bytes? I'm not > sure on > > this, exactly). I've uploaded the parsed/printed TCP dump of the end of > the > > relevant stream (starting from before the last good chunk size line) > [1], as > > well as the tail of the data that was returned to my app, starting from > the > > same point [2]. > > > > As far as I can tell, the stream indicated there was an 8kb chunk > coming, > > 8kb later it indicates another 8kb chunk is coming, but then 1296 bytes > > later there's another chunk size line ("\r\n2000\r\n"). At the packet > > level, the final packet (starting at seq 230681) appears to be simply > > repeated data from 1304 bytes back in the stream, but then terminated > > unexpectedly (never gets to the 8kb of data the earlier chunk size > indicated > > was coming). No other packets show up in this stream in my dump. > > > > I can not think of any possible thing on our side (either my code or > node's) > > that could cause this, so my conclusion is that something is going wrong > > within Amazon's Dynamo DB (or routing between there and our servers, > though > > I've tried both inside and outside of their datacenters with the same > > results). In retrospect, every bad packet I've gotten has been on > DynamoDB > > queries, despite us slamming a couple other Amazon APIs with similar > levels > > of traffic during this test, so that also points the finger in that > > direction. I think I will try and bring this up on the DynamoDB forums, > and > > just make our DDB client more robust if it gets low-level parsing errors > > (although I'm a little afraid of what could happen if it happens to > > accidentally parse correctly because of some of our data happening to > match > > up!). > > > > Thank you very much for your help with this. > > > > [1] http://thesilentb.com/files/tcpdump.txt > > [2] http://thesilentb.com/files/packet.txt > > Glad you got it solved - or the next best thing, know what the culprit is. > :-) > > The reason that you're seeing it with 0.8 and not 0.6 probably is that > http_parser.c became considerably more strict. Error handling on the > JS level hasn't changed significantly. > -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en
