Hi Mo,

I have a problem about the selenium plugin on mac. I think I successfully
set it up on mac but I have a question about the performance.
I am using a Mac with Intel Core i5 processor and 8GB ram, but I found that
each url fetched takes about 1 seconds to open and close
the firefox window. Is it a normal speed? or anything is wrong? And is it
possible to install selenium grid plugin on Mac? I will cry if you
ask me to change machine now......

Best,
Jiaxin

On Fri, Feb 13, 2015 at 2:09 PM, Mohammed Omer <[email protected]>
wrote:

> No worries man, glad everything works! Glad, since I was having hostname
> issues with nutch/hbase just now as I quickly tried to get it working/fixed
> for ya, ha.
>
> Mo
>
> On Fri, Feb 13, 2015 at 2:57 PM, Shuo Li <[email protected]> wrote:
>
>> Hey guys,
>>
>> After change my RAM to 2GB, everything works fine. My bad. Thanks for
>> your help.
>>
>> Regards,
>> Shuo Li
>>
>> On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) <
>> [email protected]> wrote:
>>
>>> Thank you Mo. I sincerely appreciate your guidance and contribution.
>>>
>>> I will work to get your nutch selenium grid plugin contributed
>>> to work with Nutch 1.x.
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: [email protected]
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Mo Omer <[email protected]>
>>> Date: Friday, February 13, 2015 at 11:10 AM
>>> To: Chris Mattmann <[email protected]>
>>> Cc: "[email protected]" <[email protected]>
>>> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>>>
>>> >Hey all,
>>> >
>>> >When I had run nutch-selenium, it was in a config such that zombies were
>>> >created from closing Firefox windows and they couldn't be reaped (again,
>>> >due to the docker configuration I had).
>>> >
>>> >In a normal setup, it should not be an issue - if you're running 20
>>> >threads in nutch that's potentially 20 open FF windows which isn't good
>>> >for 512mb.
>>> >
>>> >Selenium grid is much more efficient, in that browsers are opened, but
>>> >tabs are used to fetch sites - and only those are closed.
>>> >
>>> >Additionally, ensure you're using Nutch 2.2.1.
>>> >
>>> >Feel free to fork patch and tinker and PR as needed.
>>> >
>>> >Chris, if you want to be added to contribs on the GitHub project, that's
>>> >cool with me! Wish I could dedicate more time to this, but I don't
>>> >foresee using Nutch again in the near future, and am now working on
>>> >projects that require lots of reading and possibly patches to Caffe and
>>> >opencl r-CNN projects.
>>> >
>>> >Tl;dr:
>>> >- no, this shouldn't be typical unless you're creating zombies like
>>> crazy
>>> >and they're not being reaped (too many open file descriptors), running
>>> >out of memory, or similar resource constraint.
>>> >- selenium grid is TONs more efficient, but a bit more difficult to set
>>> >up. I used it to crawl 100ks of sites.
>>> >- unfortunately I can't commit more time to this, but if I can assist in
>>> >any admin way, let me know.
>>> >
>>> >Thank you,
>>> >
>>> >Mo
>>> >
>>> >This message was drafted on a tiny touch screen; please forgive brevity
>>> &
>>> >tpyos
>>> >
>>> >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)"
>>> >><[email protected]> wrote:
>>> >>
>>> >> Oh yes, please up your memory to like at least 2Gb..
>>> >>
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Chris Mattmann, Ph.D.
>>> >> Chief Architect
>>> >> Instrument Software and Science Data Systems Section (398)
>>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >> Office: 168-519, Mailstop: 168-527
>>> >> Email: [email protected]
>>> >> WWW:  http://sunset.usc.edu/~mattmann/
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >> Adjunct Associate Professor, Computer Science Department
>>> >> University of Southern California, Los Angeles, CA 90089 USA
>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> -----Original Message-----
>>> >> From: Shuo Li <[email protected]>
>>> >> Reply-To: "[email protected]" <[email protected]>
>>> >> Date: Friday, February 13, 2015 at 10:38 AM
>>> >> To: "[email protected]" <[email protected]>
>>> >> Cc: Mo Omer <[email protected]>
>>> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>>> >>
>>> >>> Hey Mo and Prof Mattmann,
>>> >>>
>>> >>>
>>> >>> I will try to crawl the 3 websites in the homework tonight (NASA AMD,
>>> >>>NSF
>>> >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's
>>> >>>going
>>> >>> on.
>>> >>>
>>> >>>
>>> >>> Is memory an issue? My vagrant only has 512MB of memory.
>>> >>>
>>> >>>
>>> >>> Regards,
>>> >>> Shuo Li
>>> >>>
>>> >>>
>>> >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980)
>>> >>> <[email protected]> wrote:
>>> >>>
>>> >>> Hi Shuo,
>>> >>>
>>> >>> Thanks for your email. I wonder if using selenium grid would
>>> >>> help?
>>> >>>
>>> >>> Please see this plugin:
>>> >>>
>>> >>> https://github.com/momer/nutch-selenium-grid-plugin
>>> >>>
>>> >>>
>>> >>> I’m CC’ing Mo the author of the plugin to see if he experienced
>>> >>> this while running the original selenium plugin - Mo did using
>>> >>> selenium grid help the issue that Shuo is experiencing below?
>>> >>>
>>> >>> Mo: are you cool with portion the grid plugin, or if Lewis or
>>> >>> I do it to trunk (with full credit to you of course?)
>>> >>>
>>> >>> Cheers,
>>> >>> Chris
>>> >>>
>>> >>>
>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>> Chris Mattmann, Ph.D.
>>> >>> Chief Architect
>>> >>> Instrument Software and Science Data Systems Section (398)
>>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >>> Office: 168-519, Mailstop: 168-527
>>> >>> Email: [email protected]
>>> >>> WWW:  http://sunset.usc.edu/~mattmann/
>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>> Adjunct Associate Professor, Computer Science Department
>>> >>> University of Southern California, Los Angeles, CA 90089 USA
>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> -----Original Message-----
>>> >>> From: Shuo Li <[email protected]>
>>> >>> Reply-To: "[email protected]" <[email protected]>
>>> >>> Date: Friday, February 13, 2015 at 10:12 AM
>>> >>> To: "[email protected]" <[email protected]>
>>> >>> Subject: Vagrant Crushed When using Nutch-Selenium
>>> >>>
>>> >>>> Hey guys,
>>> >>>>
>>> >>>>
>>> >>>> I'm trying to use Nutch-Selenium to crawl
>>> >>>> nutch.apache.org <http://nutch.apache.org> <http://nutch.apache.org
>>> >.
>>> >>>> However, my vagrant seems
>>> >>>> crushed after a few minutes. I forced it to shut down and it turns
>>> >>>>out it
>>> >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is
>>> Ubuntu
>>> >>>> Trusty, 14.04.
>>> >>>>
>>> >>>>
>>> >>>> Is there anything I can provide to you guys? Or is there anybody
>>> have
>>> >>>>the
>>> >>>> same issue? Or 59 websites is the complete crawling?
>>> >>>>
>>> >>>>
>>> >>>> Any suggestion would be appreciated.
>>> >>>>
>>> >>>>
>>> >>>> Regards,
>>> >>>> Shuo Li
>>> >>
>>>
>>>
>>
>

Reply via email to