By heavy load I'm talking about network traffic, either on your end, their
end, or any hop in between. "In the first packet" is certainly *not*
something I'd recommend anyone to depend on, as that depends on a whole lot
of things.

The monkey patching is gross, but hey it works. The only thing here that's
going to come back at you is making assumptions about TCP segments. I urge
you to rework that hack to search character-by-character and over packet
boundaries.

On Tue, Jan 8, 2013 at 3:49 PM, Matt <[email protected]> wrote:

> Exactly, it's designed for this one service which always sends the
> Content-Length capitalized like this, and screwed up with the comma, and in
> the first packet. If there's other screwy things in the future we can deal
> with them then. Believe me I know all about parsing headers (I had to write
> the parser for SMTP headers by hand for Haraka). It's far easier to fix
> this in JS code than to have to remember to patch node every time we
> upgrade.
>
> Not sure what you're thinking about with "under heavy load" - I can't
> imagine why that would affect anything. Got some way to elaborate on that?
>
>
> On Tue, Jan 8, 2013 at 4:42 PM, Marcel Laverdet <[email protected]>wrote:
>
>> omg I can't believe you've done this.
>>
>> Obviously this won't work if the server doesn't send
>> "Content-Length" capitalized like you have here, but if you're only
>> designing against one service that's not a huge issue. You should be aware
>> though that this may fail in certain rare circumstances, or under heavy
>> load. If the response header is received in two TCP segments the parsing
>> here will fail. Parsing is difficult and half assing it will come back to
>> bite you in the future.
>>
>>
>> On Tue, Jan 8, 2013 at 3:01 PM, Matt <[email protected]> wrote:
>>
>>> Rather than go into patching anything, I managed to get this to work:
>>>
>>>     r.on('request', function (req) {
>>>         req.on('socket', function () {
>>>             var oldOnData = req.socket.ondata;
>>>             var first_packet = true;
>>>             req.socket.ondata = function (d, start, end) {
>>>                 if (first_packet) {
>>>                     first_packet = false;
>>>                     var pos = d.indexOf("Content-Length:", start);
>>>                     if (pos === -1) {
>>>                         return oldOnData.apply(req.socket, arguments);
>>>                     }
>>>                     var seen_comma = false;
>>>                     var i = pos + start + 15;
>>>                     while (i < end && d[i] !== 0x0a) {
>>>                         console.log("Saw: " + String.fromCharCode(d[i])
>>> + " (" + d[i] + ") at pos: " + i, "blue");
>>>                         if (d[i] === 44) {
>>>                             seen_comma = true;
>>>                         }
>>>                         if (seen_comma) {
>>>                             d[i] = 32; // set to space
>>>                         }
>>>                         i++;
>>>                     }
>>>                 }
>>>                 return oldOnData.apply(req.socket, arguments);
>>>             }
>>>         })
>>>     })
>>>
>>> Hacky and a bit nasty, but works, at least with node 0.6 (have to check
>>> if the same process applies on 0.8).
>>>
>>>
>>> On Tue, Jan 8, 2013 at 3:18 PM, Marcel Laverdet <[email protected]>wrote:
>>>
>>>> Apply this patch:
>>>> https://gist.github.com/4487528
>>>>
>>>> Node shouldn't be barfing on anything a browser can display and should
>>>> really be more tolerant of these failures. I should submit a PR.. but not
>>>> sure if this will cause other issues down the road.
>>>>
>>>> On Tue, Jan 8, 2013 at 12:42 PM, Matt <[email protected]> wrote:
>>>>
>>>>>  We're doing web scraping using node and coming across an issue that
>>>>> we cannot fetch a particular URL on a particular web site, because it 
>>>>> sends
>>>>> back: "Content-Length: 1234,1234"
>>>>>
>>>>>  I totally understand that node's http parser doesn't deal with this,
>>>>> and throws an error, but is there any way we can intercept this and fix it
>>>>> up? The only way I can think of is using a proxy written in another
>>>>> language, which seems like a sucky solution.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Here's some test code to demonstrate this:
>>>>>
>>>>> var assert = require('assert');
>>>>> var http = require('http');
>>>>>
>>>>> var seen_req = false;
>>>>>
>>>>> var server = http.createServer(function(req, res) {
>>>>>   assert.equal('GET', req.method);
>>>>>   assert.equal('/foo?bar', req.url);
>>>>>   res.writeHead(200, {'Content-Type': 'text/plain', 'Content-Length':
>>>>> '6,6'});
>>>>>   res.write('hello\n');
>>>>>   res.end();
>>>>>   server.close();
>>>>>   seen_req = true;
>>>>> });
>>>>>
>>>>> server.listen(12345, function() {
>>>>>   http.get('http://127.0.0.1:' + 12345 + '/foo?bar');
>>>>> });
>>>>>
>>>>> process.on('exit', function() {
>>>>>   assert(seen_req);
>>>>> });
>>>>>
>>>>>  --
>>>>> Job Board: http://jobs.nodejs.org/
>>>>> Posting guidelines:
>>>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "nodejs" group.
>>>>> To post to this group, send email to [email protected]
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected]
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>>>
>>>>
>>>>  --
>>>> Job Board: http://jobs.nodejs.org/
>>>> Posting guidelines:
>>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>>> You received this message because you are subscribed to the Google
>>>> Groups "nodejs" group.
>>>> To post to this group, send email to [email protected]
>>>> To unsubscribe from this group, send email to
>>>> [email protected]
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>>
>>>
>>>  --
>>> Job Board: http://jobs.nodejs.org/
>>> Posting guidelines:
>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>> You received this message because you are subscribed to the Google
>>> Groups "nodejs" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>
>>
>>  --
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines:
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>
>
>  --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to