I'm learning node and decided to build a web crawler. It works well enough, but when i try to crawl a site like reddit I start running into severe memory issues.
The goal of the crawler would be to take a provided url, crawl the page, gather all internal links and crawl them, then store all the html from all pages into a mongo database or on the file system. Since i'm working with large amounts of data it is important that I understand garbage collection in node but no matter what I do I can't seem to help the performance. Any chance one of you with more expertise could take a look and help me figure out where my holes are? git repo here: https://github.com/jcrowe206/crawler[1] -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en --- You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
