Hi Li, Shuo. You are so right. I finished installing and successfully run
the butch with selenium and Firefox. I have a question though, does your
Firefox plug out for always all the urls we crawled?
Hi Prof Mattmann. I think here is the way we install selenium on MAC with
OS higher than 10.6 I think...
1. Download XQuatz, it's a dmp file, install it directly
2. Download Nutch 1.10
3. Download the patch and put it on the Nutch project directory
4. patch -p0 < THE PATCH NAME
5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial in
the github told you. The patch basically updated those .xml file for us.
And the patch also installs lib-selenium and protocol selenium for
us (Correct me if I am wrong)
6. Update tika dependency if needed
7. Go to the Nutch project directory and run ant runtime
8. Download Firefox
9. Open a new terminal and type
xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
want...)
There should be some errors after entering the command (for me at
least). Manually sudo create a /tmp/.X11-unix folder, and then set the mode
to 1777. Rerun the command. xvfb should be working.
10. Go to nutch > runtime > local and run the crawling command
Hope it helps. :)
Best,
Jiaxin
On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li <[email protected]
<javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
> I think I have possibly finished installing.
>
> What you need to do:
> 0. git status and checkout what you have modified.
> 1. patch -p0 < YOUR_PATCH_FILE
> 2. ant clean jar
> 3. ant runtime
>
> Will try crawling using selenium later on. Hope this helped. >_<
>
> On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) <
> [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Yes I believe you need to install X11 - why don't you try and report
>> back what you find thanks.
>>
>> Sent from my iPhone
>>
>> On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>> Hi professor, but can we use Selenium on Mac?
>>
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
>> [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> polar dataset you have been assigned in my CSCI 572 search engines class.
>>>
>>> The instructions for integrating Selenium with Nutch 1.10-trunk
>>> are here:
>>>
>>> https://issues.apache.org/jira/browse/NUTCH-1933
>>>
>>>
>>> Cheers,
>>> Chris
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: [email protected]
>>> WWW: http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jiaxin Ye <[email protected]>
>>> Reply-To: "[email protected]" <[email protected]>
>>> Date: Thursday, February 12, 2015 at 12:46 AM
>>> To: "[email protected]" <[email protected]>
>>> Subject: Re: Nutch-Selenium in Nutch 1.10
>>>
>>> >Well, good choice. I am thinking changing to ubuntu now. The thing is
>>> why
>>> >do we need Selenium anyway? Just easier to perform crawling?
>>> >
>>> >On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> ><[email protected]> wrote:
>>> >
>>> >Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>>> >using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>>> >be installed properly. The issue would be I don't know how to integrate
>>> >Selenium with Nutch 1.10.
>>> >
>>> >On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> ><[email protected]> wrote:
>>> >
>>> >Hi all,
>>> >
>>> >
>>> >Anyone here knows where to find the setup tutorial for Selenium on Mac
>>> ??
>>> >I find it difficult to install Xvfb on mac.
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> ><[email protected]> wrote:
>>> >
>>> >Hi Shuo Li,
>>> >
>>> >
>>> >We were facing a similar issue. Prof. Mattman suggested we look into
>>> this
>>> >patch for Selenium on Nutch 1.10 :
>>> >https://issues.apache.org/jira/browse/NUTCH-1933.
>>> >
>>> >
>>> >Hope this helps!
>>> >
>>> >
>>> >Thanks,
>>> >Sapna
>>> >
>>> >On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> ><[email protected]> wrote:
>>> >
>>> >Yop,
>>> >
>>> >
>>> >I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >out:
>>> >
>>> >
>>> >error: package org.apache.nutch.storage does not exist
>>> >
>>> >
>>> >
>>> >I can only find this package in Nutch 2.x. Is there a way to use
>>> Selenium
>>> >in 1.10?
>>> >
>>> >
>>> >Any advice would be appreciated.
>>> >
>>> >
>>> >Regards,
>>> >Shuo Li
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >Graduate Student
>>> >MS in CS (Data Science)
>>> >Viterbi School of Engineering
>>> >University of Southern California
>>> >
>>> >
>>> >Phone:
>>> >+1 650-307-9848 <tel:%2B1%20650-307-9848>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>