Hi Jason, This looks like a job for async's queue<https://github.com/caolan/async#queue>. It will help you correctly manage the download of extremely large tasks. Refer to my example and the documentation on how to use it.
See this updated version of your gist <https://gist.github.com/3872261>. Julian Lannigan On Thu, Oct 11, 2012 at 12:13 AM, Jason Venable <[email protected]>wrote: > First a disclaimer that I am very new to node so please point me in the > right direction if this is not the right place or I have not provided > enough or the correct details. > > I have a use case where I need to perform many http get request against > many URLs. This could be in the thousands. There is no need at all for > these to be synchronous or complete in order. Node seems like a great fit > but there is one thing I can quite figure out. > > I have a loop that is queuing up a ton of http get request and node starts > chipping away at these but in this big pool there may be some urls that > will timeout. I'd like to capture these individual url timeouts and handle > appropriately. I have set the .setTimeout() method to the get request but > this appears to be attached to an individual socket and it appears this > socket could very well be handling or queued up with 100s of requests (if > not all of them) and if one url times out the entire socket gets kill also > timing out the other URLs that are pending. > > Code is below reproduce. It simply does 1000 GETs against different google > domains and has a setTimeout() option for 5 seconds. It works great for > about 5 seconds and then nearly every request that has been queued up is > killed. I assume this is because I am killing the entire socket all the > pooled requests are depending on > > Do I need to open a unique socket for each request? How would I go about > doing this? > > Is there anything else am I doing wrong or should be doing differently to > handle this type of scenario. > > The code: https://gist.github.com/3870100 > > var http = require('http'); > var extensions = ['com', 'ac', 'ad', 'ae', 'com.af', 'com.ag', 'com.ai', > 'am', 'it.ao', > 'com.ar', 'as', 'at', 'com.au', 'az', 'ba', 'com.bd', 'be', 'bf', 'bg', ' > com.bh', 'bi', > 'bj', 'com.bn', 'com.bo', 'com.br', 'bs', 'co.bw', 'com.by', 'com.bz', > 'ca', 'com.kh', > 'cc', 'cd', 'cf', 'cat', 'cg', 'ch', 'ci', 'co.ck', 'cl', 'cm', 'cn', ' > com.co', 'co.cr', > 'com.cu', 'cv', 'cz', 'de', 'dj', 'dk', 'dm', 'com.do', 'dz', 'com.ec', > 'ee', 'com.eg', > 'es', 'com.et', 'fi', 'com.fj', 'fm', 'fr', 'ga', 'gd', 'ge', 'gf', 'gg', > 'com.gh', > 'com.gi', 'gl', 'gm', 'gp', 'gr', 'com.gt', 'gy', 'com.hk', 'hn', 'hr', > 'ht', 'hu', > 'co.id', 'iq', 'ie', 'co.il', 'im', 'co.in', 'io', 'is', 'it', 'je', ' > com.jm', 'jo', > 'co.jp', 'co.ke', 'com.kh', 'ki', 'kg', 'co.kr', 'com.kw', 'kz', 'la', ' > com.lb', > 'com.lc', 'li', 'lk', 'co.ls', 'lt', 'lu', 'lv', 'com.ly', 'co.ma', 'md', > 'me', 'mg', > 'mk', 'ml', 'mn', 'ms', 'com.mt', 'mu', 'mv', 'mw', 'com.mx', 'com.my', ' > co.mz', > 'com.na', 'ne', 'com.nf', 'com.ng', 'com.ni', 'nl', 'no', 'com.np', 'nr', > 'nu', 'co.nz', > 'com.om', 'com.pa', 'com.pe', 'com.ph', 'com.pk', 'pl', 'pn', 'com.pr', > 'ps', 'pt', > 'com.py', 'com.qa', 'ro', 'rs', 'ru', 'rw', 'com.sa', 'com.sb', 'sc', > 'se', 'com.sg', > 'sh', 'si', 'sk', 'com.sl', 'sn', 'sm', 'so', 'st', 'com.sv', 'td', 'tg', > 'co.th', > 'com.tj', 'tk', 'tl', 'tm', 'to', 'com.tn', 'com.tr', 'tt', 'com.tw', ' > co.tz', 'com.ua', > 'co.ug', 'co.uk', 'us', 'com.uy', 'co.uz', 'com.vc', 'co.ve', 'vg', 'co.vi', > 'com.vn', > 'vu', 'ws', 'co.za', 'co.zm', 'co.zw']; > var extCount = 0; > > //Do a lot of http get requests to different google pages > for(var i = 0; i < 1000; i++){ > var url = 'www.google.' + extensions[extCount]; > doRequest(url); > //Increment the extension counter > extCount++; > //Reset the extension counter if necessary > if(extCount == extensions.length){ > extCount = 0; > } > } > > function doRequest(url){ > http.get({host: url}, function(res) { > console.log(url + ' : ' + res.statusCode); > }).on('error', function(e) { > console.log(url + ' error: ' + e.message); > }).setTimeout(5000,function(){ > this.abort(); //This kills all pooled/queue http get requests waiting to > be processed > //How to I catch a timeout for each individual url request > }); > } > > Oh, and should I post this on stack exchange or are double posts frowned > upon? > > Thanks, > Jason > > -- > Job Board: http://jobs.nodejs.org/ > Posting guidelines: > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines > You received this message because you are subscribed to the Google > Groups "nodejs" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/nodejs?hl=en?hl=en > -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en
