This is great, Jiaxin, can you please make a wiki page on the Nutch wiki that has this information?
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Jiaxin Ye <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, February 12, 2015 at 9:39 PM To: "[email protected]" <[email protected]> Subject: Nutch-Selenium in Nutch 1.10 >Hi Li, Shuo. You are so right. I finished installing and successfully run >the butch with selenium and Firefox. I have a question though, does your >Firefox plug out for always all the urls we crawled? > > >Hi Prof Mattmann. I think here is the way we install selenium on MAC with >OS higher than 10.6 I think... > > >1. Download XQuatz, it's a dmp file, install it directly >2. Download Nutch 1.10 >3. Download the patch and put it on the Nutch project directory >4. patch -p0 < THE PATCH NAME >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial >in the github told you. The patch basically updated those .xml file for >us. And the patch also installs lib-selenium and protocol selenium for us >(Correct me if > I am wrong) >6. Update tika dependency if needed >7. Go to the Nutch project directory and run ant runtime >8. Download Firefox >9. Open a new terminal and type > xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you >want...) > There should be some errors after entering the command (for me at >least). Manually sudo create a /tmp/.X11-unix folder, and then set the >mode to 1777. Rerun the command. xvfb should be working. >10. Go to nutch > runtime > local and run the crawling command > > >Hope it helps. :) > > >Best, >Jiaxin > > > > > >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li ><[email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >I think I have possibly finished installing. > > >What you need to do: >0. git status and checkout what you have modified. >1. patch -p0 < YOUR_PATCH_FILE >2. ant clean jar >3. ant runtime > > >Will try crawling using selenium later on. Hope this helped. >_< > > >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) ><[email protected] ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >Yes I believe you need to install X11 - why don't you try and report back >what you find thanks. > >Sent from my iPhone > >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected] ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > > > >Hi professor, but can we use Selenium on Mac? > >On Thursday, February 12, 2015, Mattmann, Chris A (3980) ><[email protected] ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >You need Selenium Jiaxin, in order to crawl dynamic pages in the >polar dataset you have been assigned in my CSCI 572 search engines class. > >The instructions for integrating Selenium with Nutch 1.10-trunk >are here: > >https://issues.apache.org/jira/browse/NUTCH-1933 > > >Cheers, >Chris > > >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Chris Mattmann, Ph.D. >Chief Architect >Instrument Software and Science Data Systems Section (398) >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >Office: 168-519, Mailstop: 168-527 >Email: [email protected] >WWW: http://sunset.usc.edu/~mattmann/ >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Adjunct Associate Professor, Computer Science Department >University of Southern California, Los Angeles, CA 90089 USA >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > >-----Original Message----- >From: Jiaxin Ye <[email protected]> >Reply-To: "[email protected]" <[email protected]> >Date: Thursday, February 12, 2015 at 12:46 AM >To: "[email protected]" <[email protected]> >Subject: Re: Nutch-Selenium in Nutch 1.10 > >>Well, good choice. I am thinking changing to ubuntu now. The thing is why >>do we need Selenium anyway? Just easier to perform crawling? >> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li >><[email protected]> wrote: >> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still >>be installed properly. The issue would be I don't know how to integrate >>Selenium with Nutch 1.10. >> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye >><[email protected]> wrote: >> >>Hi all, >> >> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ?? >>I find it difficult to install Xvfb on mac. >> >> >>Best, >>Jiaxin >> >> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh >><[email protected]> wrote: >> >>Hi Shuo Li, >> >> >>We were facing a similar issue. Prof. Mattman suggested we look into this >>patch for Selenium on Nutch 1.10 : >>https://issues.apache.org/jira/browse/NUTCH-1933. >> >> >>Hope this helps! >> >> >>Thanks, >>Sapna >> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li >><[email protected]> wrote: >> >>Yop, >> >> >>I'm trying to install selenium in Nutch 1.10. However, this error pops >>out: >> >> >>error: package org.apache.nutch.storage does not exist >> >> >> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium >>in 1.10? >> >> >>Any advice would be appreciated. >> >> >>Regards, >>Shuo Li >> >> >> >> >> >> >> >> >> >> >>-- >>Graduate Student >>MS in CS (Data Science) >>Viterbi School of Engineering >>University of Southern California >> >> >>Phone: >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > > > > > > > > > > > > > > > > > > > > >

