No worries man, glad everything works! Glad, since I was having hostname
issues with nutch/hbase just now as I quickly tried to get it working/fixed
for ya, ha.

Mo

On Fri, Feb 13, 2015 at 2:57 PM, Shuo Li <[email protected]> wrote:

> Hey guys,
>
> After change my RAM to 2GB, everything works fine. My bad. Thanks for your
> help.
>
> Regards,
> Shuo Li
>
> On Fri, Feb 13, 2015 at 11:34 AM, Mattmann, Chris A (3980) <
> [email protected]> wrote:
>
>> Thank you Mo. I sincerely appreciate your guidance and contribution.
>>
>> I will work to get your nutch selenium grid plugin contributed
>> to work with Nutch 1.x.
>>
>> Cheers,
>> Chris
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: [email protected]
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Mo Omer <[email protected]>
>> Date: Friday, February 13, 2015 at 11:10 AM
>> To: Chris Mattmann <[email protected]>
>> Cc: "[email protected]" <[email protected]>
>> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>>
>> >Hey all,
>> >
>> >When I had run nutch-selenium, it was in a config such that zombies were
>> >created from closing Firefox windows and they couldn't be reaped (again,
>> >due to the docker configuration I had).
>> >
>> >In a normal setup, it should not be an issue - if you're running 20
>> >threads in nutch that's potentially 20 open FF windows which isn't good
>> >for 512mb.
>> >
>> >Selenium grid is much more efficient, in that browsers are opened, but
>> >tabs are used to fetch sites - and only those are closed.
>> >
>> >Additionally, ensure you're using Nutch 2.2.1.
>> >
>> >Feel free to fork patch and tinker and PR as needed.
>> >
>> >Chris, if you want to be added to contribs on the GitHub project, that's
>> >cool with me! Wish I could dedicate more time to this, but I don't
>> >foresee using Nutch again in the near future, and am now working on
>> >projects that require lots of reading and possibly patches to Caffe and
>> >opencl r-CNN projects.
>> >
>> >Tl;dr:
>> >- no, this shouldn't be typical unless you're creating zombies like crazy
>> >and they're not being reaped (too many open file descriptors), running
>> >out of memory, or similar resource constraint.
>> >- selenium grid is TONs more efficient, but a bit more difficult to set
>> >up. I used it to crawl 100ks of sites.
>> >- unfortunately I can't commit more time to this, but if I can assist in
>> >any admin way, let me know.
>> >
>> >Thank you,
>> >
>> >Mo
>> >
>> >This message was drafted on a tiny touch screen; please forgive brevity &
>> >tpyos
>> >
>> >> On Feb 13, 2015, at 12:41 PM, "Mattmann, Chris A (3980)"
>> >><[email protected]> wrote:
>> >>
>> >> Oh yes, please up your memory to like at least 2Gb..
>> >>
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Chris Mattmann, Ph.D.
>> >> Chief Architect
>> >> Instrument Software and Science Data Systems Section (398)
>> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >> Office: 168-519, Mailstop: 168-527
>> >> Email: [email protected]
>> >> WWW:  http://sunset.usc.edu/~mattmann/
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >> Adjunct Associate Professor, Computer Science Department
>> >> University of Southern California, Los Angeles, CA 90089 USA
>> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Shuo Li <[email protected]>
>> >> Reply-To: "[email protected]" <[email protected]>
>> >> Date: Friday, February 13, 2015 at 10:38 AM
>> >> To: "[email protected]" <[email protected]>
>> >> Cc: Mo Omer <[email protected]>
>> >> Subject: Re: Vagrant Crushed When using Nutch-Selenium
>> >>
>> >>> Hey Mo and Prof Mattmann,
>> >>>
>> >>>
>> >>> I will try to crawl the 3 websites in the homework tonight (NASA AMD,
>> >>>NSF
>> >>> ACADIS and NSIDC Arctic Data Explorer). I will let you know what's
>> >>>going
>> >>> on.
>> >>>
>> >>>
>> >>> Is memory an issue? My vagrant only has 512MB of memory.
>> >>>
>> >>>
>> >>> Regards,
>> >>> Shuo Li
>> >>>
>> >>>
>> >>> On Fri, Feb 13, 2015 at 10:25 AM, Mattmann, Chris A (3980)
>> >>> <[email protected]> wrote:
>> >>>
>> >>> Hi Shuo,
>> >>>
>> >>> Thanks for your email. I wonder if using selenium grid would
>> >>> help?
>> >>>
>> >>> Please see this plugin:
>> >>>
>> >>> https://github.com/momer/nutch-selenium-grid-plugin
>> >>>
>> >>>
>> >>> I’m CC’ing Mo the author of the plugin to see if he experienced
>> >>> this while running the original selenium plugin - Mo did using
>> >>> selenium grid help the issue that Shuo is experiencing below?
>> >>>
>> >>> Mo: are you cool with portion the grid plugin, or if Lewis or
>> >>> I do it to trunk (with full credit to you of course?)
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>>
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Chris Mattmann, Ph.D.
>> >>> Chief Architect
>> >>> Instrument Software and Science Data Systems Section (398)
>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>> Office: 168-519, Mailstop: 168-527
>> >>> Email: [email protected]
>> >>> WWW:  http://sunset.usc.edu/~mattmann/
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Adjunct Associate Professor, Computer Science Department
>> >>> University of Southern California, Los Angeles, CA 90089 USA
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> -----Original Message-----
>> >>> From: Shuo Li <[email protected]>
>> >>> Reply-To: "[email protected]" <[email protected]>
>> >>> Date: Friday, February 13, 2015 at 10:12 AM
>> >>> To: "[email protected]" <[email protected]>
>> >>> Subject: Vagrant Crushed When using Nutch-Selenium
>> >>>
>> >>>> Hey guys,
>> >>>>
>> >>>>
>> >>>> I'm trying to use Nutch-Selenium to crawl
>> >>>> nutch.apache.org <http://nutch.apache.org> <http://nutch.apache.org
>> >.
>> >>>> However, my vagrant seems
>> >>>> crushed after a few minutes. I forced it to shut down and it turns
>> >>>>out it
>> >>>> only crawled 59 websites. My nutch version is 1.10 and my OS is
>> Ubuntu
>> >>>> Trusty, 14.04.
>> >>>>
>> >>>>
>> >>>> Is there anything I can provide to you guys? Or is there anybody have
>> >>>>the
>> >>>> same issue? Or 59 websites is the complete crawling?
>> >>>>
>> >>>>
>> >>>> Any suggestion would be appreciated.
>> >>>>
>> >>>>
>> >>>> Regards,
>> >>>> Shuo Li
>> >>
>>
>>
>

Reply via email to