thanks Jiaxin!

I again repeated the entire installation procedure and I think i have installed 
it correctly.(it said BUILD SUCCESSFUL after ant runtime command and has 
selenium jar files in runtime/local/lib folder)

When i started crawling the mozilla browser popped 2 times,but when i saw crawl 
statistics,it had fetched no urls(Did anyone have this problem?)

I had following error while crawling:-

org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 
127.0.0.1 on port 7055 after 45000 ms. Firefox console output:
h changes to installed add-ons
1424295898279   addons.xpi-utils        DEBUG   Updating add-on states
1424295898281   addons.xpi-utils        DEBUG   Writing add-ons list
1424295898291   addons.manager  DEBUG   Registering shutdown blocker for 
XPIProvider
1424295898292   addons.manager  DEBUG   Registering shutdown blocker for 
LightweightThemeManager
1424295898295   addons.manager  DEBUG   Registering shutdown blocker for 
OpenH264Provider
1424295898296   addons.manager  DEBUG   Registering shutdown blocker for 
PluginProvider
1424295898775   DeferredSave.extensions.json    DEBUG   Starting timer
1424295898800   DeferredSave.extensions.json    DEBUG   Starting write
1424295898858   addons.manager  DEBUG   shutdown
1424295898859   addons.manager  DEBUG   Calling shutdown blocker for XPIProvider
1424295898859   addons.xpi      DEBUG   shutdown
1424295898860   addons.xpi-utils        DEBUG   shutdown
1424295898861   addons.manager  DEBUG   Calling shutdown blocker for 
LightweightThemeManager
1424295898862   addons.manager  DEBUG   Calling shutdown blocker for 
OpenH264Provider
1424295898864   addons.manager  DEBUG   Calling shutdown blocker for 
PluginProvider
1424295899016   DeferredSave.extensions.json    DEBUG   Write succeeded
1424295899016   addons.xpi-utils        DEBUG   XPI Database saved, setting 
schema version preference to 16
1424295899017   addons.xpi      DEBUG   Notifying XPI shutdown observers
1424295899025   addons.manager  DEBUG   Async provider shutdown done
1424295900455   addons.manager  DEBUG   Loaded provider scope for 
resource://gre/modules/addons/XPIProvider.jsm: ["XPIProvider"]
1424295900459   addons.manager  DEBUG   Loaded provider scope for 
resource://gre/modules/LightweightThemeManager.jsm: ["LightweightThemeManager"]
1424295900468   addons.xpi      DEBUG   startup
1424295900470   addons.xpi      INFO    Mapping [email protected] to 
/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]
1424295900471   addons.xpi      DEBUG   Ignoring file entry whose name is not a 
valid add-on ID: 
/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging
1424295900472   addons.xpi      INFO    Mapping 
{972ce4c6-7e08-4474-a285-3208198ce6fd} to 
/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}
1424295900473   addons.xpi      DEBUG   Skipping unavailable install location 
app-system-share
1424295900475   addons.xpi      DEBUG   checkForChanges
1424295900476   addons.xpi      DEBUG   Loaded add-on state from prefs: 
{"app-profile":{"[email protected]":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
1424295900480   addons.xpi      DEBUG   getModTime: Recursive scan of 
{972ce4c6-7e08-4474-a285-3208198ce6fd}
1424295900483   addons.xpi      DEBUG   getInstallState changed: false, state: 
{"app-profile":{"[email protected]":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
1424295900488   addons.xpi      DEBUG   No changes found
1424295900502   addons.manager  DEBUG   Registering shutdown blocker for 
XPIProvider
1424295900504   addons.manager  DEBUG   Registering shutdown blocker for 
LightweightThemeManager
1424295900507   addons.manager  DEBUG   Registering shutdown blocker for 
OpenH264Provider
1424295900508   addons.manager  DEBUG   Registering shutdown blocker for 
PluginProvider
*** Blocklist::_preloadBlocklistFile: blocklist is disabled
1424295903113   addons.manager  DEBUG   Registering shutdown blocker for 
<unnamed-provider>

        at 
org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118)
        at 
org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246)
        at 
org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114)
        at 
org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191)
        at 
org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186)
        at 
org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182)
        at 
org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95)
        at 
org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53)
        at 
org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199)
        at 
org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161)
        at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
        at 
org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101)
        at 
org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151)
        at 
org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, 
fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, 
fetchQueues.getQueueCount=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, 
fetchQueues.getQueueCount=1

> On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <[email protected]> wrote:
> 
> Hi,
> 
> When you install the patch, did you see any fails? No fail is tolerated. I am 
> guessing there is something wrong with ivy.xml. I am suggesting that checkout 
> ALL files in Nutch and then try it again. 
> 
> Best,
> Jiaxin
> 
> On Tuesday, February 17, 2015, Jaydeep Bagrecha <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi all,
>       I am trying to install and build selenium with nutch1.10 on Mac 
> Yosemite.
> 
>  having following error after downloading selenium 
> patch(https://issues.apache.org/jira/browse/NUTCH-1933 
> <https://issues.apache.org/jira/browse/NUTCH-1933>) and while using “ant 
> runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
> 
>  error: package org.openqa.selenium does not exist
>     [javac] import org.openqa.selenium.By <http://org.openqa.selenium.by/>;
>     [javac]                           ^
>  error: package org.openqa.selenium does not exist
>     [javac] import org.openqa.selenium.WebDriver;
>     [javac]                           ^
>  error: package org.openqa.selenium.firefox does not exist
>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>     [javac]                                   ^
>  error: package org.openqa.selenium.firefox does not exist
>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
> error: cannot find symbol
>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new 
> ThreadLocal<WebDriver>() {
>     [javac]                             ^
>     [javac]   symbol:   class WebDriver
>     [javac]   location: class HttpWebClient
>  error: cannot find symbol
>     [javac]     protected WebDriver initialValue()
>     [javac]               ^
>     [javac]   symbol: class WebDriver
>  error: cannot find symbol
>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>     [javac]       ^
>     [javac]   symbol: class FirefoxProfile
> error: cannot find symbol
>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>     [javac]                              ^
>     [javac]   symbol: class FirefoxDriver
>  error: cannot find symbol
>     [javac]       driver = new FirefoxDriver();
>     [javac]                    ^
>     [javac]   symbol:   class FirefoxDriver
>     [javac]   location: class HttpWebClient
> 
>  error: cannot find symbol
>     [javac]       new WebDriverWait(driver, 3);
>     [javac]           ^
>     [javac]   symbol:   class WebDriverWait
>     [javac]   location: class HttpWebClient
> 
>  error: cannot find symbol
>     [javac]       String innerHtml = 
> driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>     [javac]                                             ^
>     [javac]   symbol:   variable By
>     [javac]   location: class HttpWebClient
> 
> Thanks,
> Jaydeep
> 
>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <[email protected] 
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>> 
>> Sure. I will do it once I confirm it works...
>> 
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) 
>> <[email protected] 
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>> wiki that has this information?
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: [email protected] <>
>> WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Jiaxin Ye <[email protected] <>>
>> Reply-To: "[email protected] <>" <[email protected] <>>
>> Date: Thursday, February 12, 2015 at 9:39 PM
>> To: "[email protected] <>" <[email protected] <>>
>> Subject: Nutch-Selenium in Nutch 1.10
>> 
>> >Hi Li, Shuo. You are so right. I finished installing and successfully run
>> >the butch with selenium and Firefox. I have a question though, does your
>> >Firefox plug out for always all the urls we crawled?
>> >
>> >
>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with
>> >OS higher than 10.6 I think...
>> >
>> >
>> >1. Download XQuatz, it's a dmp file, install it directly
>> >2. Download Nutch 1.10
>> >3. Download the patch and put it on the Nutch project directory
>> >4. patch -p0 < THE PATCH NAME
>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>> >in the github told you. The patch basically updated those .xml file for
>> >us. And the patch also installs lib-selenium and protocol selenium for us
>> >(Correct me if
>> > I am wrong)
>> >6. Update tika dependency if needed
>> >7. Go to the Nutch project directory and run ant runtime
>> >8. Download Firefox
>> >9. Open a new terminal and type
>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>> >want...)
>> >    There should be some errors after entering the command (for me at
>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>> >mode to 1777. Rerun the command. xvfb should be working.
>> >10. Go to nutch > runtime > local and run the crawling command
>> >
>> >
>> >Hope it helps. :)
>> >
>> >
>> >Best,
>> >Jiaxin
>> >
>> >
>> >
>> >
>> >
>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>> ><[email protected] <> <javascript:_e(%7B%7D,'cvml','[email protected] <>');>> 
>> >wrote:
>> >
>> >I think I have possibly finished installing.
>> >
>> >
>> >What you need to do:
>> >0. git status and checkout what you have modified.
>> >1. patch -p0 < YOUR_PATCH_FILE
>> >2. ant clean jar
>> >3. ant runtime
>> >
>> >
>> >Will try crawling using selenium later on. Hope this helped. >_<
>> >
>> >
>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>> ><[email protected] <>
>> ><javascript:_e(%7B%7D,'cvml','[email protected] <>');>> wrote:
>> >
>> >Yes I believe you need to install X11 - why don't you try and report back
>> >what you find thanks.
>> >
>> >Sent from my iPhone
>> >
>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected] <>
>> ><javascript:_e(%7B%7D,'cvml','[email protected] <>');>> wrote:
>> >
>> >
>> >
>> >Hi professor, but can we use Selenium on Mac?
>> >
>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>> ><[email protected] <>
>> ><javascript:_e(%7B%7D,'cvml','[email protected] <>');>> wrote:
>> >
>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>> >polar dataset you have been assigned in my CSCI 572 search engines class.
>> >
>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>> >are here:
>> >
>> >https://issues.apache.org/jira/browse/NUTCH-1933 
>> ><https://issues.apache.org/jira/browse/NUTCH-1933>
>> >
>> >
>> >Cheers,
>> >Chris
>> >
>> >
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Chris Mattmann, Ph.D.
>> >Chief Architect
>> >Instrument Software and Science Data Systems Section (398)
>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >Office: 168-519, Mailstop: 168-527
>> >Email: [email protected] <>
>> >WWW:  http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >Adjunct Associate Professor, Computer Science Department
>> >University of Southern California, Los Angeles, CA 90089 USA
>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>> >
>> >
>> >
>> >
>> >
>> >-----Original Message-----
>> >From: Jiaxin Ye <[email protected] <>>
>> >Reply-To: "[email protected] <>" <[email protected] <>>
>> >Date: Thursday, February 12, 2015 at 12:46 AM
>> >To: "[email protected] <>" <[email protected] <>>
>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>> >
>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why
>> >>do we need Selenium anyway? Just easier to perform crawling?
>> >>
>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>> >><[email protected] <>> wrote:
>> >>
>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still
>> >>be installed properly. The issue would be I don't know how to integrate
>> >>Selenium with Nutch 1.10.
>> >>
>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>> >><[email protected] <>> wrote:
>> >>
>> >>Hi all,
>> >>
>> >>
>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ??
>> >>I find it difficult to install Xvfb on mac.
>> >>
>> >>
>> >>Best,
>> >>Jiaxin
>> >>
>> >>
>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>> >><[email protected] <>> wrote:
>> >>
>> >>Hi Shuo Li,
>> >>
>> >>
>> >>We were facing a similar issue. Prof. Mattman suggested we look into this
>> >>patch for Selenium on Nutch 1.10 :
>> >>https://issues.apache.org/jira/browse/NUTCH-1933 
>> >><https://issues.apache.org/jira/browse/NUTCH-1933>.
>> >>
>> >>
>> >>Hope this helps!
>> >>
>> >>
>> >>Thanks,
>> >>Sapna
>> >>
>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>> >><[email protected] <>> wrote:
>> >>
>> >>Yop,
>> >>
>> >>
>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>> >>out:
>> >>
>> >>
>> >>error: package org.apache.nutch.storage does not exist
>> >>
>> >>
>> >>
>> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium
>> >>in 1.10?
>> >>
>> >>
>> >>Any advice would be appreciated.
>> >>
>> >>
>> >>Regards,
>> >>Shuo Li
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>--
>> >>Graduate Student
>> >>MS in CS (Data Science)
>> >>Viterbi School of Engineering
>> >>University of Southern California
>> >>
>> >>
>> >>Phone:
>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> 
> 

Reply via email to