No worries man, glad everything works! Glad, since I was having hostname issues with nutch/hbase just now as I quickly tried to get it working/fixed for ya, ha.
Mo On Fri, Feb 13, 2015 at 2:57 PM, Shuo Li <[email protected]> wrote: > Hey guys, > > After change my RAM to 2GB, everything works fine. My bad. Thanks for your > help. > > Regards, > Shuo Li > > On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) < > [email protected]> wrote: > >> Thank you Mo. I sincerely appreciate your guidance and contribution. >> >> I will work to get your nutch selenium grid plugin contributed >> to work with Nutch 1.x. >> >> Cheers, >> Chris >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: [email protected] >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> -----Original Message----- >> From: Mo Omer <[email protected]> >> Date: Friday, February 13, 2015 at 11:10 AM >> To: Chris Mattmann <[email protected]> >> Cc: "[email protected]" <[email protected]> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium >> >> >Hey all, >> > >> >When I had run nutch-selenium, it was in a config such that zombies were >> >created from closing Firefox windows and they couldn't be reaped (again, >> >due to the docker configuration I had). >> > >> >In a normal setup, it should not be an issue - if you're running 20 >> >threads in nutch that's potentially 20 open FF windows which isn't good >> >for 512mb. >> > >> >Selenium grid is much more efficient, in that browsers are opened, but >> >tabs are used to fetch sites - and only those are closed. >> > >> >Additionally, ensure you're using Nutch 2.2.1. >> > >> >Feel free to fork patch and tinker and PR as needed. >> > >> >Chris, if you want to be added to contribs on the GitHub project, that's >> >cool with me! Wish I could dedicate more time to this, but I don't >> >foresee using Nutch again in the near future, and am now working on >> >projects that require lots of reading and possibly patches to Caffe and >> >opencl r-CNN projects. >> > >> >Tl;dr: >> >- no, this shouldn't be typical unless you're creating zombies like crazy >> >and they're not being reaped (too many open file descriptors), running >> >out of memory, or similar resource constraint. >> >- selenium grid is TONs more efficient, but a bit more difficult to set >> >up. I used it to crawl 100ks of sites. >> >- unfortunately I can't commit more time to this, but if I can assist in >> >any admin way, let me know. >> > >> >Thank you, >> > >> >Mo >> > >> >This message was drafted on a tiny touch screen; please forgive brevity & >> >tpyos >> > >> >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)" >> >><[email protected]> wrote: >> >> >> >> Oh yes, please up your memory to like at least 2Gb.. >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Chief Architect >> >> Instrument Software and Science Data Systems Section (398) >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 168-519, Mailstop: 168-527 >> >> Email: [email protected] >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Associate Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> From: Shuo Li <[email protected]> >> >> Reply-To: "[email protected]" <[email protected]> >> >> Date: Friday, February 13, 2015 at 10:38 AM >> >> To: "[email protected]" <[email protected]> >> >> Cc: Mo Omer <[email protected]> >> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium >> >> >> >>> Hey Mo and Prof Mattmann, >> >>> >> >>> >> >>> I will try to crawl the 3 websites in the homework tonight (NASA AMD, >> >>>NSF >> >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's >> >>>going >> >>> on. >> >>> >> >>> >> >>> Is memory an issue? My vagrant only has 512MB of memory. >> >>> >> >>> >> >>> Regards, >> >>> Shuo Li >> >>> >> >>> >> >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980) >> >>> <[email protected]> wrote: >> >>> >> >>> Hi Shuo, >> >>> >> >>> Thanks for your email. I wonder if using selenium grid would >> >>> help? >> >>> >> >>> Please see this plugin: >> >>> >> >>> https://github.com/momer/nutch-selenium-grid-plugin >> >>> >> >>> >> >>> I’m CC’ing Mo the author of the plugin to see if he experienced >> >>> this while running the original selenium plugin - Mo did using >> >>> selenium grid help the issue that Shuo is experiencing below? >> >>> >> >>> Mo: are you cool with portion the grid plugin, or if Lewis or >> >>> I do it to trunk (with full credit to you of course?) >> >>> >> >>> Cheers, >> >>> Chris >> >>> >> >>> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Chris Mattmann, Ph.D. >> >>> Chief Architect >> >>> Instrument Software and Science Data Systems Section (398) >> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>> Office: 168-519, Mailstop: 168-527 >> >>> Email: [email protected] >> >>> WWW: http://sunset.usc.edu/~mattmann/ >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> Adjunct Associate Professor, Computer Science Department >> >>> University of Southern California, Los Angeles, CA 90089 USA >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -----Original Message----- >> >>> From: Shuo Li <[email protected]> >> >>> Reply-To: "[email protected]" <[email protected]> >> >>> Date: Friday, February 13, 2015 at 10:12 AM >> >>> To: "[email protected]" <[email protected]> >> >>> Subject: Vagrant Crushed When using Nutch-Selenium >> >>> >> >>>> Hey guys, >> >>>> >> >>>> >> >>>> I'm trying to use Nutch-Selenium to crawl >> >>>> nutch.apache.org <http://nutch.apache.org> <http://nutch.apache.org >> >. >> >>>> However, my vagrant seems >> >>>> crushed after a few minutes. I forced it to shut down and it turns >> >>>>out it >> >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is >> Ubuntu >> >>>> Trusty, 14.04. >> >>>> >> >>>> >> >>>> Is there anything I can provide to you guys? Or is there anybody have >> >>>>the >> >>>> same issue? Or 59 websites is the complete crawling? >> >>>> >> >>>> >> >>>> Any suggestion would be appreciated. >> >>>> >> >>>> >> >>>> Regards, >> >>>> Shuo Li >> >> >> >> >

