I am using fetcher.threads.per.queue = 30 by the way.

On Mon, Feb 16, 2015 at 12:08 AM, Jiaxin Ye <[email protected]> wrote:

> Hi Mo,
>
> I have a problem about the selenium plugin on mac. I think I successfully
> set it up on mac but I have a question about the performance.
> I am using a Mac with Intel Core i5 processor and 8GB ram, but I found
> that each url fetched takes about 1 seconds to open and close
> the firefox window. Is it a normal speed? or anything is wrong? And is it
> possible to install selenium grid plugin on Mac? I will cry if you
> ask me to change machine now......
>
> Best,
> Jiaxin
>
> On Fri, Feb 13, 2015 at 2:09 PM, Mohammed Omer <[email protected]>
> wrote:
>
>> No worries man, glad everything works! Glad, since I was having hostname
>> issues with nutch/hbase just now as I quickly tried to get it working/fixed
>> for ya, ha.
>>
>> Mo
>>
>> On Fri, Feb 13, 2015 at 2:57 PM, Shuo Li <[email protected]> wrote:
>>
>>> Hey guys,
>>>
>>> After change my RAM to 2GB, everything works fine. My bad. Thanks for
>>> your help.
>>>
>>> Regards,
>>> Shuo Li
>>>
>>> On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) <
>>> [email protected]> wrote:
>>>
>>>> Thank you Mo. I sincerely appreciate your guidance and contribution.
>>>>
>>>> I will work to get your nutch selenium grid plugin contributed
>>>> to work with Nutch 1.x.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>>
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398)
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: [email protected]
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Associate Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Mo Omer <[email protected]>
>>>> Date: Friday, February 13, 2015 at 11:10 AM
>>>> To: Chris Mattmann <[email protected]>
>>>> Cc: "[email protected]" <[email protected]>
>>>> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>>>>
>>>> >Hey all,
>>>> >
>>>> >When I had run nutch-selenium, it was in a config such that zombies
>>>> were
>>>> >created from closing Firefox windows and they couldn't be reaped
>>>> (again,
>>>> >due to the docker configuration I had).
>>>> >
>>>> >In a normal setup, it should not be an issue - if you're running 20
>>>> >threads in nutch that's potentially 20 open FF windows which isn't good
>>>> >for 512mb.
>>>> >
>>>> >Selenium grid is much more efficient, in that browsers are opened, but
>>>> >tabs are used to fetch sites - and only those are closed.
>>>> >
>>>> >Additionally, ensure you're using Nutch 2.2.1.
>>>> >
>>>> >Feel free to fork patch and tinker and PR as needed.
>>>> >
>>>> >Chris, if you want to be added to contribs on the GitHub project,
>>>> that's
>>>> >cool with me! Wish I could dedicate more time to this, but I don't
>>>> >foresee using Nutch again in the near future, and am now working on
>>>> >projects that require lots of reading and possibly patches to Caffe and
>>>> >opencl r-CNN projects.
>>>> >
>>>> >Tl;dr:
>>>> >- no, this shouldn't be typical unless you're creating zombies like
>>>> crazy
>>>> >and they're not being reaped (too many open file descriptors), running
>>>> >out of memory, or similar resource constraint.
>>>> >- selenium grid is TONs more efficient, but a bit more difficult to set
>>>> >up. I used it to crawl 100ks of sites.
>>>> >- unfortunately I can't commit more time to this, but if I can assist
>>>> in
>>>> >any admin way, let me know.
>>>> >
>>>> >Thank you,
>>>> >
>>>> >Mo
>>>> >
>>>> >This message was drafted on a tiny touch screen; please forgive
>>>> brevity &
>>>> >tpyos
>>>> >
>>>> >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)"
>>>> >><[email protected]> wrote:
>>>> >>
>>>> >> Oh yes, please up your memory to like at least 2Gb..
>>>> >>
>>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> >> Chris Mattmann, Ph.D.
>>>> >> Chief Architect
>>>> >> Instrument Software and Science Data Systems Section (398)
>>>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> >> Office: 168-519, Mailstop: 168-527
>>>> >> Email: [email protected]
>>>> >> WWW:  http://sunset.usc.edu/~mattmann/
>>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> >> Adjunct Associate Professor, Computer Science Department
>>>> >> University of Southern California, Los Angeles, CA 90089 USA
>>>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> -----Original Message-----
>>>> >> From: Shuo Li <[email protected]>
>>>> >> Reply-To: "[email protected]" <[email protected]>
>>>> >> Date: Friday, February 13, 2015 at 10:38 AM
>>>> >> To: "[email protected]" <[email protected]>
>>>> >> Cc: Mo Omer <[email protected]>
>>>> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>>>> >>
>>>> >>> Hey Mo and Prof Mattmann,
>>>> >>>
>>>> >>>
>>>> >>> I will try to crawl the 3 websites in the homework tonight (NASA
>>>> AMD,
>>>> >>>NSF
>>>> >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's
>>>> >>>going
>>>> >>> on.
>>>> >>>
>>>> >>>
>>>> >>> Is memory an issue? My vagrant only has 512MB of memory.
>>>> >>>
>>>> >>>
>>>> >>> Regards,
>>>> >>> Shuo Li
>>>> >>>
>>>> >>>
>>>> >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980)
>>>> >>> <[email protected]> wrote:
>>>> >>>
>>>> >>> Hi Shuo,
>>>> >>>
>>>> >>> Thanks for your email. I wonder if using selenium grid would
>>>> >>> help?
>>>> >>>
>>>> >>> Please see this plugin:
>>>> >>>
>>>> >>> https://github.com/momer/nutch-selenium-grid-plugin
>>>> >>>
>>>> >>>
>>>> >>> I’m CC’ing Mo the author of the plugin to see if he experienced
>>>> >>> this while running the original selenium plugin - Mo did using
>>>> >>> selenium grid help the issue that Shuo is experiencing below?
>>>> >>>
>>>> >>> Mo: are you cool with portion the grid plugin, or if Lewis or
>>>> >>> I do it to trunk (with full credit to you of course?)
>>>> >>>
>>>> >>> Cheers,
>>>> >>> Chris
>>>> >>>
>>>> >>>
>>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> >>> Chris Mattmann, Ph.D.
>>>> >>> Chief Architect
>>>> >>> Instrument Software and Science Data Systems Section (398)
>>>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> >>> Office: 168-519, Mailstop: 168-527
>>>> >>> Email: [email protected]
>>>> >>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> >>> Adjunct Associate Professor, Computer Science Department
>>>> >>> University of Southern California, Los Angeles, CA 90089 USA
>>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> -----Original Message-----
>>>> >>> From: Shuo Li <[email protected]>
>>>> >>> Reply-To: "[email protected]" <[email protected]>
>>>> >>> Date: Friday, February 13, 2015 at 10:12 AM
>>>> >>> To: "[email protected]" <[email protected]>
>>>> >>> Subject: Vagrant Crushed When using Nutch-Selenium
>>>> >>>
>>>> >>>> Hey guys,
>>>> >>>>
>>>> >>>>
>>>> >>>> I'm trying to use Nutch-Selenium to crawl
>>>> >>>> nutch.apache.org <http://nutch.apache.org> <
>>>> http://nutch.apache.org>.
>>>> >>>> However, my vagrant seems
>>>> >>>> crushed after a few minutes. I forced it to shut down and it turns
>>>> >>>>out it
>>>> >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is
>>>> Ubuntu
>>>> >>>> Trusty, 14.04.
>>>> >>>>
>>>> >>>>
>>>> >>>> Is there anything I can provide to you guys? Or is there anybody
>>>> have
>>>> >>>>the
>>>> >>>> same issue? Or 59 websites is the complete crawling?
>>>> >>>>
>>>> >>>>
>>>> >>>> Any suggestion would be appreciated.
>>>> >>>>
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Shuo Li
>>>> >>
>>>>
>>>>
>>>
>>
>

Reply via email to