Hi Jason,

This looks like a job for async's queue<https://github.com/caolan/async#queue>.
 It will help you correctly manage the download of extremely large tasks.
Refer to my example and the documentation on how to use it.

See this updated version of your gist <https://gist.github.com/3872261>.

Julian Lannigan



On Thu, Oct 11, 2012 at 12:13 AM, Jason Venable <[email protected]>wrote:

> First a disclaimer that I am very new to node so please point me in the
> right direction if this is not the right place or I have not provided
> enough or the correct details.
>
> I have a use case where I need to perform many http get request against
> many URLs. This could be in the thousands. There is no need at all for
> these to be synchronous or complete in order. Node seems like a great fit
> but there is one thing I can quite figure out.
>
> I have a loop that is queuing up a ton of http get request and node starts
> chipping away at these but in this big pool there may be some urls that
> will timeout. I'd like to capture these individual url timeouts and handle
> appropriately. I have set the .setTimeout() method to the get request but
> this appears to be attached to an individual socket and it appears this
> socket could very well be handling or queued up with 100s of requests (if
> not all of them) and if one url times out the entire socket gets kill also
> timing out the other URLs that are pending.
>
> Code is below reproduce. It simply does 1000 GETs against different google
> domains and has a setTimeout() option for 5 seconds. It works great for
> about 5 seconds and then nearly every request that has been queued up is
> killed. I assume this is because I am killing the entire socket all the
> pooled requests are depending on
>
> Do I need to open a unique socket for each request? How would I go about
> doing this?
>
> Is there anything else am I doing wrong or should be doing differently to
> handle this type of scenario.
>
> The code: https://gist.github.com/3870100
>
> var http = require('http');
> var extensions = ['com', 'ac', 'ad', 'ae', 'com.af', 'com.ag', 'com.ai',
> 'am', 'it.ao',
> 'com.ar', 'as', 'at', 'com.au', 'az', 'ba', 'com.bd', 'be', 'bf', 'bg', '
> com.bh', 'bi',
> 'bj', 'com.bn', 'com.bo', 'com.br', 'bs', 'co.bw', 'com.by', 'com.bz',
> 'ca', 'com.kh',
> 'cc', 'cd', 'cf', 'cat', 'cg', 'ch', 'ci', 'co.ck', 'cl', 'cm', 'cn', '
> com.co', 'co.cr',
> 'com.cu', 'cv', 'cz', 'de', 'dj', 'dk', 'dm', 'com.do', 'dz', 'com.ec',
> 'ee', 'com.eg',
> 'es', 'com.et', 'fi', 'com.fj', 'fm', 'fr', 'ga', 'gd', 'ge', 'gf', 'gg',
> 'com.gh',
> 'com.gi', 'gl', 'gm', 'gp', 'gr', 'com.gt', 'gy', 'com.hk', 'hn', 'hr',
> 'ht', 'hu',
> 'co.id', 'iq', 'ie', 'co.il', 'im', 'co.in', 'io', 'is', 'it', 'je', '
> com.jm', 'jo',
> 'co.jp', 'co.ke', 'com.kh', 'ki', 'kg', 'co.kr', 'com.kw', 'kz', 'la', '
> com.lb',
> 'com.lc', 'li', 'lk', 'co.ls', 'lt', 'lu', 'lv', 'com.ly', 'co.ma', 'md',
> 'me', 'mg',
> 'mk', 'ml', 'mn', 'ms', 'com.mt', 'mu', 'mv', 'mw', 'com.mx', 'com.my', '
> co.mz',
> 'com.na', 'ne', 'com.nf', 'com.ng', 'com.ni', 'nl', 'no', 'com.np', 'nr',
> 'nu', 'co.nz',
> 'com.om', 'com.pa', 'com.pe', 'com.ph', 'com.pk', 'pl', 'pn', 'com.pr',
> 'ps', 'pt',
> 'com.py', 'com.qa', 'ro', 'rs', 'ru', 'rw', 'com.sa', 'com.sb', 'sc',
> 'se', 'com.sg',
> 'sh', 'si', 'sk', 'com.sl', 'sn', 'sm', 'so', 'st', 'com.sv', 'td', 'tg',
> 'co.th',
> 'com.tj', 'tk', 'tl', 'tm', 'to', 'com.tn', 'com.tr', 'tt', 'com.tw', '
> co.tz', 'com.ua',
> 'co.ug', 'co.uk', 'us', 'com.uy', 'co.uz', 'com.vc', 'co.ve', 'vg', 'co.vi',
> 'com.vn',
> 'vu', 'ws', 'co.za', 'co.zm', 'co.zw'];
> var extCount = 0;
>
> //Do a lot of http get requests to different google pages
> for(var i = 0; i < 1000; i++){
> var url = 'www.google.' + extensions[extCount];
>  doRequest(url);
>  //Increment the extension counter
>  extCount++;
>  //Reset the extension counter if necessary
>  if(extCount == extensions.length){
> extCount = 0;
>  }
> }
>
> function doRequest(url){
>  http.get({host: url}, function(res) {
>  console.log(url + ' : ' + res.statusCode);
>  }).on('error', function(e) {
> console.log(url + ' error: ' + e.message);
>  }).setTimeout(5000,function(){
> this.abort(); //This kills all pooled/queue http get requests waiting to
> be processed
>  //How to I catch a timeout for each individual url request
> });
> }
>
> Oh, and should I post this on stack exchange or are double posts frowned
> upon?
>
> Thanks,
> Jason
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to