This is great, Jiaxin, can you please make a wiki page on the Nutch
wiki that has this information?

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Jiaxin Ye <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, February 12, 2015 at 9:39 PM
To: "[email protected]" <[email protected]>
Subject: Nutch-Selenium in Nutch 1.10

>Hi Li, Shuo. You are so right. I finished installing and successfully run
>the butch with selenium and Firefox. I have a question though, does your
>Firefox plug out for always all the urls we crawled?
>
>
>Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>OS higher than 10.6 I think...
>
>
>1. Download XQuatz, it's a dmp file, install it directly
>2. Download Nutch 1.10
>3. Download the patch and put it on the Nutch project directory
>4. patch -p0 < THE PATCH NAME
>5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>in the github told you. The patch basically updated those .xml file for
>us. And the patch also installs lib-selenium and protocol selenium for us
>(Correct me if
> I am wrong)
>6. Update tika dependency if needed
>7. Go to the Nutch project directory and run ant runtime
>8. Download Firefox
>9. Open a new terminal and type
>    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>want...)
>    There should be some errors after entering the command (for me at
>least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>mode to 1777. Rerun the command. xvfb should be working.
>10. Go to nutch > runtime > local and run the crawling command
>
>
>Hope it helps. :)
>
>
>Best,
>Jiaxin
>
>
>
>
>
>On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
><[email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>I think I have possibly finished installing.
>
>
>What you need to do:
>0. git status and checkout what you have modified.
>1. patch -p0 < YOUR_PATCH_FILE
>2. ant clean jar
>3. ant runtime
>
>
>Will try crawling using selenium later on. Hope this helped. >_<
>
>
>On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
><[email protected]
><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>Yes I believe you need to install X11 - why don't you try and report back
>what you find thanks.
>
>Sent from my iPhone
>
>On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected]
><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>
>
>Hi professor, but can we use Selenium on Mac?
>
>On Thursday, February 12, 2015, Mattmann, Chris A (3980)
><[email protected]
><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>You need Selenium Jiaxin, in order to crawl dynamic pages in the
>polar dataset you have been assigned in my CSCI 572 search engines class.
>
>The instructions for integrating Selenium with Nutch 1.10-trunk
>are here:
>
>https://issues.apache.org/jira/browse/NUTCH-1933
>
>
>Cheers,
>Chris
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: [email protected]
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Jiaxin Ye <[email protected]>
>Reply-To: "[email protected]" <[email protected]>
>Date: Thursday, February 12, 2015 at 12:46 AM
>To: "[email protected]" <[email protected]>
>Subject: Re: Nutch-Selenium in Nutch 1.10
>
>>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>>do we need Selenium anyway? Just easier to perform crawling?
>>
>>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>><[email protected]> wrote:
>>
>>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>be installed properly. The issue would be I don't know how to integrate
>>Selenium with Nutch 1.10.
>>
>>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>><[email protected]> wrote:
>>
>>Hi all,
>>
>>
>>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>>I find it difficult to install Xvfb on mac.
>>
>>
>>Best,
>>Jiaxin
>>
>>
>>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>><[email protected]> wrote:
>>
>>Hi Shuo Li,
>>
>>
>>We were facing a similar issue. Prof. Mattman suggested we look into this
>>patch for Selenium on Nutch 1.10 :
>>https://issues.apache.org/jira/browse/NUTCH-1933.
>>
>>
>>Hope this helps!
>>
>>
>>Thanks,
>>Sapna
>>
>>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>><[email protected]> wrote:
>>
>>Yop,
>>
>>
>>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>out:
>>
>>
>>error: package org.apache.nutch.storage does not exist
>>
>>
>>
>>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>>in 1.10?
>>
>>
>>Any advice would be appreciated.
>>
>>
>>Regards,
>>Shuo Li
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>--
>>Graduate Student
>>MS in CS (Data Science)
>>Viterbi School of Engineering
>>University of Southern California
>>
>>
>>Phone:
>>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to