Hi Mo, I have a problem about the selenium plugin on mac. I think I successfully set it up on mac but I have a question about the performance. I am using a Mac with Intel Core i5 processor and 8GB ram, but I found that each url fetched takes about 1 seconds to open and close the firefox window. Is it a normal speed? or anything is wrong? And is it possible to install selenium grid plugin on Mac? I will cry if you ask me to change machine now......
Best, Jiaxin On Fri, Feb 13, 2015 at 2:09 PM, Mohammed Omer <[email protected]> wrote: > No worries man, glad everything works! Glad, since I was having hostname > issues with nutch/hbase just now as I quickly tried to get it working/fixed > for ya, ha. > > Mo > > On Fri, Feb 13, 2015 at 2:57 PM, Shuo Li <[email protected]> wrote: > >> Hey guys, >> >> After change my RAM to 2GB, everything works fine. My bad. Thanks for >> your help. >> >> Regards, >> Shuo Li >> >> On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) < >> [email protected]> wrote: >> >>> Thank you Mo. I sincerely appreciate your guidance and contribution. >>> >>> I will work to get your nutch selenium grid plugin contributed >>> to work with Nutch 1.x. >>> >>> Cheers, >>> Chris >>> >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Chief Architect >>> Instrument Software and Science Data Systems Section (398) >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 168-519, Mailstop: 168-527 >>> Email: [email protected] >>> WWW: http://sunset.usc.edu/~mattmann/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Associate Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: Mo Omer <[email protected]> >>> Date: Friday, February 13, 2015 at 11:10 AM >>> To: Chris Mattmann <[email protected]> >>> Cc: "[email protected]" <[email protected]> >>> Subject: Re: Vagrant Crushed When using Nutch-Selenium >>> >>> >Hey all, >>> > >>> >When I had run nutch-selenium, it was in a config such that zombies were >>> >created from closing Firefox windows and they couldn't be reaped (again, >>> >due to the docker configuration I had). >>> > >>> >In a normal setup, it should not be an issue - if you're running 20 >>> >threads in nutch that's potentially 20 open FF windows which isn't good >>> >for 512mb. >>> > >>> >Selenium grid is much more efficient, in that browsers are opened, but >>> >tabs are used to fetch sites - and only those are closed. >>> > >>> >Additionally, ensure you're using Nutch 2.2.1. >>> > >>> >Feel free to fork patch and tinker and PR as needed. >>> > >>> >Chris, if you want to be added to contribs on the GitHub project, that's >>> >cool with me! Wish I could dedicate more time to this, but I don't >>> >foresee using Nutch again in the near future, and am now working on >>> >projects that require lots of reading and possibly patches to Caffe and >>> >opencl r-CNN projects. >>> > >>> >Tl;dr: >>> >- no, this shouldn't be typical unless you're creating zombies like >>> crazy >>> >and they're not being reaped (too many open file descriptors), running >>> >out of memory, or similar resource constraint. >>> >- selenium grid is TONs more efficient, but a bit more difficult to set >>> >up. I used it to crawl 100ks of sites. >>> >- unfortunately I can't commit more time to this, but if I can assist in >>> >any admin way, let me know. >>> > >>> >Thank you, >>> > >>> >Mo >>> > >>> >This message was drafted on a tiny touch screen; please forgive brevity >>> & >>> >tpyos >>> > >>> >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)" >>> >><[email protected]> wrote: >>> >> >>> >> Oh yes, please up your memory to like at least 2Gb.. >>> >> >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> Chris Mattmann, Ph.D. >>> >> Chief Architect >>> >> Instrument Software and Science Data Systems Section (398) >>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> >> Office: 168-519, Mailstop: 168-527 >>> >> Email: [email protected] >>> >> WWW: http://sunset.usc.edu/~mattmann/ >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> Adjunct Associate Professor, Computer Science Department >>> >> University of Southern California, Los Angeles, CA 90089 USA >>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> -----Original Message----- >>> >> From: Shuo Li <[email protected]> >>> >> Reply-To: "[email protected]" <[email protected]> >>> >> Date: Friday, February 13, 2015 at 10:38 AM >>> >> To: "[email protected]" <[email protected]> >>> >> Cc: Mo Omer <[email protected]> >>> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium >>> >> >>> >>> Hey Mo and Prof Mattmann, >>> >>> >>> >>> >>> >>> I will try to crawl the 3 websites in the homework tonight (NASA AMD, >>> >>>NSF >>> >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's >>> >>>going >>> >>> on. >>> >>> >>> >>> >>> >>> Is memory an issue? My vagrant only has 512MB of memory. >>> >>> >>> >>> >>> >>> Regards, >>> >>> Shuo Li >>> >>> >>> >>> >>> >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980) >>> >>> <[email protected]> wrote: >>> >>> >>> >>> Hi Shuo, >>> >>> >>> >>> Thanks for your email. I wonder if using selenium grid would >>> >>> help? >>> >>> >>> >>> Please see this plugin: >>> >>> >>> >>> https://github.com/momer/nutch-selenium-grid-plugin >>> >>> >>> >>> >>> >>> I’m CC’ing Mo the author of the plugin to see if he experienced >>> >>> this while running the original selenium plugin - Mo did using >>> >>> selenium grid help the issue that Shuo is experiencing below? >>> >>> >>> >>> Mo: are you cool with portion the grid plugin, or if Lewis or >>> >>> I do it to trunk (with full credit to you of course?) >>> >>> >>> >>> Cheers, >>> >>> Chris >>> >>> >>> >>> >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> Chris Mattmann, Ph.D. >>> >>> Chief Architect >>> >>> Instrument Software and Science Data Systems Section (398) >>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> >>> Office: 168-519, Mailstop: 168-527 >>> >>> Email: [email protected] >>> >>> WWW: http://sunset.usc.edu/~mattmann/ >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> Adjunct Associate Professor, Computer Science Department >>> >>> University of Southern California, Los Angeles, CA 90089 USA >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> >>> From: Shuo Li <[email protected]> >>> >>> Reply-To: "[email protected]" <[email protected]> >>> >>> Date: Friday, February 13, 2015 at 10:12 AM >>> >>> To: "[email protected]" <[email protected]> >>> >>> Subject: Vagrant Crushed When using Nutch-Selenium >>> >>> >>> >>>> Hey guys, >>> >>>> >>> >>>> >>> >>>> I'm trying to use Nutch-Selenium to crawl >>> >>>> nutch.apache.org <http://nutch.apache.org> <http://nutch.apache.org >>> >. >>> >>>> However, my vagrant seems >>> >>>> crushed after a few minutes. I forced it to shut down and it turns >>> >>>>out it >>> >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is >>> Ubuntu >>> >>>> Trusty, 14.04. >>> >>>> >>> >>>> >>> >>>> Is there anything I can provide to you guys? Or is there anybody >>> have >>> >>>>the >>> >>>> same issue? Or 59 websites is the complete crawling? >>> >>>> >>> >>>> >>> >>>> Any suggestion would be appreciated. >>> >>>> >>> >>>> >>> >>>> Regards, >>> >>>> Shuo Li >>> >> >>> >>> >> >

