thanks Jiaxin! I again repeated the entire installation procedure and I think i have installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command and has selenium jar files in runtime/local/lib folder)
When i started crawling the mozilla browser popped 2 times,but when i saw crawl statistics,it had fetched no urls(Did anyone have this problem?) I had following error while crawling:- org.openqa.selenium.firefox.NotConnectedException: Unable to connect to host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output: h changes to installed add-ons 1424295898279 addons.xpi-utils DEBUG Updating add-on states 1424295898281 addons.xpi-utils DEBUG Writing add-ons list 1424295898291 addons.manager DEBUG Registering shutdown blocker for XPIProvider 1424295898292 addons.manager DEBUG Registering shutdown blocker for LightweightThemeManager 1424295898295 addons.manager DEBUG Registering shutdown blocker for OpenH264Provider 1424295898296 addons.manager DEBUG Registering shutdown blocker for PluginProvider 1424295898775 DeferredSave.extensions.json DEBUG Starting timer 1424295898800 DeferredSave.extensions.json DEBUG Starting write 1424295898858 addons.manager DEBUG shutdown 1424295898859 addons.manager DEBUG Calling shutdown blocker for XPIProvider 1424295898859 addons.xpi DEBUG shutdown 1424295898860 addons.xpi-utils DEBUG shutdown 1424295898861 addons.manager DEBUG Calling shutdown blocker for LightweightThemeManager 1424295898862 addons.manager DEBUG Calling shutdown blocker for OpenH264Provider 1424295898864 addons.manager DEBUG Calling shutdown blocker for PluginProvider 1424295899016 DeferredSave.extensions.json DEBUG Write succeeded 1424295899016 addons.xpi-utils DEBUG XPI Database saved, setting schema version preference to 16 1424295899017 addons.xpi DEBUG Notifying XPI shutdown observers 1424295899025 addons.manager DEBUG Async provider shutdown done 1424295900455 addons.manager DEBUG Loaded provider scope for resource://gre/modules/addons/XPIProvider.jsm: ["XPIProvider"] 1424295900459 addons.manager DEBUG Loaded provider scope for resource://gre/modules/LightweightThemeManager.jsm: ["LightweightThemeManager"] 1424295900468 addons.xpi DEBUG startup 1424295900470 addons.xpi INFO Mapping [email protected] to /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected] 1424295900471 addons.xpi DEBUG Ignoring file entry whose name is not a valid add-on ID: /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging 1424295900472 addons.xpi INFO Mapping {972ce4c6-7e08-4474-a285-3208198ce6fd} to /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd} 1424295900473 addons.xpi DEBUG Skipping unavailable install location app-system-share 1424295900475 addons.xpi DEBUG checkForChanges 1424295900476 addons.xpi DEBUG Loaded add-on state from prefs: {"app-profile":{"[email protected]":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}} 1424295900480 addons.xpi DEBUG getModTime: Recursive scan of {972ce4c6-7e08-4474-a285-3208198ce6fd} 1424295900483 addons.xpi DEBUG getInstallState changed: false, state: {"app-profile":{"[email protected]":{"d":"/var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}} 1424295900488 addons.xpi DEBUG No changes found 1424295900502 addons.manager DEBUG Registering shutdown blocker for XPIProvider 1424295900504 addons.manager DEBUG Registering shutdown blocker for LightweightThemeManager 1424295900507 addons.manager DEBUG Registering shutdown blocker for OpenH264Provider 1424295900508 addons.manager DEBUG Registering shutdown blocker for PluginProvider *** Blocklist::_preloadBlocklistFile: blocklist is disabled 1424295903113 addons.manager DEBUG Registering shutdown blocker for <unnamed-provider> at org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118) at org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246) at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114) at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191) at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186) at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182) at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95) at org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53) at org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199) at org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161) at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56) at org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101) at org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151) at org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722) -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, fetchQueues.getQueueCount=1 > On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <[email protected]> wrote: > > Hi, > > When you install the patch, did you see any fails? No fail is tolerated. I am > guessing there is something wrong with ivy.xml. I am suggesting that checkout > ALL files in Nutch and then try it again. > > Best, > Jiaxin > > On Tuesday, February 17, 2015, Jaydeep Bagrecha <[email protected] > <mailto:[email protected]>> wrote: > Hi all, > I am trying to install and build selenium with nutch1.10 on Mac > Yosemite. > > having following error after downloading selenium > patch(https://issues.apache.org/jira/browse/NUTCH-1933 > <https://issues.apache.org/jira/browse/NUTCH-1933>) and while using “ant > runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it? > > error: package org.openqa.selenium does not exist > [javac] import org.openqa.selenium.By <http://org.openqa.selenium.by/>; > [javac] ^ > error: package org.openqa.selenium does not exist > [javac] import org.openqa.selenium.WebDriver; > [javac] ^ > error: package org.openqa.selenium.firefox does not exist > [javac] import org.openqa.selenium.firefox.FirefoxDriver; > [javac] ^ > error: package org.openqa.selenium.firefox does not exist > [javac] import org.openqa.selenium.firefox.FirefoxProfile; > error: cannot find symbol > [javac] public static ThreadLocal<WebDriver> threadWebDriver = new > ThreadLocal<WebDriver>() { > [javac] ^ > [javac] symbol: class WebDriver > [javac] location: class HttpWebClient > error: cannot find symbol > [javac] protected WebDriver initialValue() > [javac] ^ > [javac] symbol: class WebDriver > error: cannot find symbol > [javac] FirefoxProfile profile = new FirefoxProfile(); > [javac] ^ > [javac] symbol: class FirefoxProfile > error: cannot find symbol > [javac] WebDriver driver = new FirefoxDriver(profile); > [javac] ^ > [javac] symbol: class FirefoxDriver > error: cannot find symbol > [javac] driver = new FirefoxDriver(); > [javac] ^ > [javac] symbol: class FirefoxDriver > [javac] location: class HttpWebClient > > error: cannot find symbol > [javac] new WebDriverWait(driver, 3); > [javac] ^ > [javac] symbol: class WebDriverWait > [javac] location: class HttpWebClient > > error: cannot find symbol > [javac] String innerHtml = > driver.findElement(By.tagName("body")).getAttribute("innerHTML"); > [javac] ^ > [javac] symbol: variable By > [javac] location: class HttpWebClient > > Thanks, > Jaydeep > >> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >> Sure. I will do it once I confirm it works... >> >> On Thursday, February 12, 2015, Mattmann, Chris A (3980) >> <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> This is great, Jiaxin, can you please make a wiki page on the Nutch >> wiki that has this information? >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: [email protected] <> >> WWW: http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> -----Original Message----- >> From: Jiaxin Ye <[email protected] <>> >> Reply-To: "[email protected] <>" <[email protected] <>> >> Date: Thursday, February 12, 2015 at 9:39 PM >> To: "[email protected] <>" <[email protected] <>> >> Subject: Nutch-Selenium in Nutch 1.10 >> >> >Hi Li, Shuo. You are so right. I finished installing and successfully run >> >the butch with selenium and Firefox. I have a question though, does your >> >Firefox plug out for always all the urls we crawled? >> > >> > >> >Hi Prof Mattmann. I think here is the way we install selenium on MAC with >> >OS higher than 10.6 I think... >> > >> > >> >1. Download XQuatz, it's a dmp file, install it directly >> >2. Download Nutch 1.10 >> >3. Download the patch and put it on the Nutch project directory >> >4. patch -p0 < THE PATCH NAME >> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial >> >in the github told you. The patch basically updated those .xml file for >> >us. And the patch also installs lib-selenium and protocol selenium for us >> >(Correct me if >> > I am wrong) >> >6. Update tika dependency if needed >> >7. Go to the Nutch project directory and run ant runtime >> >8. Download Firefox >> >9. Open a new terminal and type >> > xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you >> >want...) >> > There should be some errors after entering the command (for me at >> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the >> >mode to 1777. Rerun the command. xvfb should be working. >> >10. Go to nutch > runtime > local and run the crawling command >> > >> > >> >Hope it helps. :) >> > >> > >> >Best, >> >Jiaxin >> > >> > >> > >> > >> > >> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li >> ><[email protected] <> <javascript:_e(%7B%7D,'cvml','[email protected] <>');>> >> >wrote: >> > >> >I think I have possibly finished installing. >> > >> > >> >What you need to do: >> >0. git status and checkout what you have modified. >> >1. patch -p0 < YOUR_PATCH_FILE >> >2. ant clean jar >> >3. ant runtime >> > >> > >> >Will try crawling using selenium later on. Hope this helped. >_< >> > >> > >> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) >> ><[email protected] <> >> ><javascript:_e(%7B%7D,'cvml','[email protected] <>');>> wrote: >> > >> >Yes I believe you need to install X11 - why don't you try and report back >> >what you find thanks. >> > >> >Sent from my iPhone >> > >> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected] <> >> ><javascript:_e(%7B%7D,'cvml','[email protected] <>');>> wrote: >> > >> > >> > >> >Hi professor, but can we use Selenium on Mac? >> > >> >On Thursday, February 12, 2015, Mattmann, Chris A (3980) >> ><[email protected] <> >> ><javascript:_e(%7B%7D,'cvml','[email protected] <>');>> wrote: >> > >> >You need Selenium Jiaxin, in order to crawl dynamic pages in the >> >polar dataset you have been assigned in my CSCI 572 search engines class. >> > >> >The instructions for integrating Selenium with Nutch 1.10-trunk >> >are here: >> > >> >https://issues.apache.org/jira/browse/NUTCH-1933 >> ><https://issues.apache.org/jira/browse/NUTCH-1933> >> > >> > >> >Cheers, >> >Chris >> > >> > >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >Chris Mattmann, Ph.D. >> >Chief Architect >> >Instrument Software and Science Data Systems Section (398) >> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >Office: 168-519, Mailstop: 168-527 >> >Email: [email protected] <> >> >WWW: http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/> >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >Adjunct Associate Professor, Computer Science Department >> >University of Southern California, Los Angeles, CA 90089 USA >> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > >> > >> > >> > >> > >> > >> >-----Original Message----- >> >From: Jiaxin Ye <[email protected] <>> >> >Reply-To: "[email protected] <>" <[email protected] <>> >> >Date: Thursday, February 12, 2015 at 12:46 AM >> >To: "[email protected] <>" <[email protected] <>> >> >Subject: Re: Nutch-Selenium in Nutch 1.10 >> > >> >>Well, good choice. I am thinking changing to ubuntu now. The thing is why >> >>do we need Selenium anyway? Just easier to perform crawling? >> >> >> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li >> >><[email protected] <>> wrote: >> >> >> >>Interestingly, I'm a mac user but I don't want to screw my laptop so I'm >> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still >> >>be installed properly. The issue would be I don't know how to integrate >> >>Selenium with Nutch 1.10. >> >> >> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye >> >><[email protected] <>> wrote: >> >> >> >>Hi all, >> >> >> >> >> >>Anyone here knows where to find the setup tutorial for Selenium on Mac ?? >> >>I find it difficult to install Xvfb on mac. >> >> >> >> >> >>Best, >> >>Jiaxin >> >> >> >> >> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh >> >><[email protected] <>> wrote: >> >> >> >>Hi Shuo Li, >> >> >> >> >> >>We were facing a similar issue. Prof. Mattman suggested we look into this >> >>patch for Selenium on Nutch 1.10 : >> >>https://issues.apache.org/jira/browse/NUTCH-1933 >> >><https://issues.apache.org/jira/browse/NUTCH-1933>. >> >> >> >> >> >>Hope this helps! >> >> >> >> >> >>Thanks, >> >>Sapna >> >> >> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li >> >><[email protected] <>> wrote: >> >> >> >>Yop, >> >> >> >> >> >>I'm trying to install selenium in Nutch 1.10. However, this error pops >> >>out: >> >> >> >> >> >>error: package org.apache.nutch.storage does not exist >> >> >> >> >> >> >> >>I can only find this package in Nutch 2.x. Is there a way to use Selenium >> >>in 1.10? >> >> >> >> >> >>Any advice would be appreciated. >> >> >> >> >> >>Regards, >> >>Shuo Li >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >>-- >> >>Graduate Student >> >>MS in CS (Data Science) >> >>Viterbi School of Engineering >> >>University of Southern California >> >> >> >> >> >>Phone: >> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >

