We're doing web scraping using node and coming across an issue that we
cannot fetch a particular URL on a particular web site, because it sends
back: "Content-Length: 1234,1234"

I totally understand that node's http parser doesn't deal with this, and
throws an error, but is there any way we can intercept this and fix it up?
The only way I can think of is using a proxy written in another language,
which seems like a sucky solution.

Thoughts?

Here's some test code to demonstrate this:

var assert = require('assert');
var http = require('http');

var seen_req = false;

var server = http.createServer(function(req, res) {
  assert.equal('GET', req.method);
  assert.equal('/foo?bar', req.url);
  res.writeHead(200, {'Content-Type': 'text/plain', 'Content-Length':
'6,6'});
  res.write('hello\n');
  res.end();
  server.close();
  seen_req = true;
});

server.listen(12345, function() {
  http.get('http://127.0.0.1:' + 12345 + '/foo?bar');
});

process.on('exit', function() {
  assert(seen_req);
});

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to