Update: if xvfb -screen scrn 1024x758x34 doesn't work try xvfb :11 -screen 0 1024x768x24
On Thu, Feb 19, 2015 at 1:25 AM, Jaydeep Bagrecha <[email protected]> wrote: > Update: > > selenium latest version 2.44.0 doesn’t seem to work with firefox latest > version(35),so I installed firefox version 29 and it’s crawling properly > now. > > On Feb 18, 2015, at 2:56 PM, Jaydeep Bagrecha <[email protected]> wrote: > > thanks Jiaxin! > > I again repeated the entire installation procedure and I think i have > installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command > and has selenium jar files in runtime/local/lib folder) > > *When i started crawling the mozilla browser popped 2 times,but when i saw > crawl statistics,it had fetched no urls(*Did anyone have this problem?) > > I had following error while crawling:- > > *org.openqa.selenium.firefox.NotConnectedException: Unable to connect to > host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:* > *h changes to installed add-ons* > 1424295898279 addons.xpi-utils DEBUG Updating add-on states > 1424295898281 addons.xpi-utils DEBUG Writing add-ons list > 1424295898291 addons.manager DEBUG Registering shutdown blocker for > XPIProvider > 1424295898292 addons.manager DEBUG Registering shutdown blocker for > LightweightThemeManager > 1424295898295 addons.manager DEBUG Registering shutdown blocker for > OpenH264Provider > 1424295898296 addons.manager DEBUG Registering shutdown blocker for > PluginProvider > 1424295898775 DeferredSave.extensions.json DEBUG Starting timer > 1424295898800 DeferredSave.extensions.json DEBUG Starting write > 1424295898858 addons.manager DEBUG shutdown > 1424295898859 addons.manager DEBUG Calling shutdown blocker for > XPIProvider > 1424295898859 addons.xpi DEBUG shutdown > 1424295898860 addons.xpi-utils DEBUG shutdown > 1424295898861 addons.manager DEBUG Calling shutdown blocker for > LightweightThemeManager > 1424295898862 addons.manager DEBUG Calling shutdown blocker for > OpenH264Provider > 1424295898864 addons.manager DEBUG Calling shutdown blocker for > PluginProvider > 1424295899016 DeferredSave.extensions.json DEBUG Write succeeded > 1424295899016 addons.xpi-utils DEBUG XPI Database saved, setting schema > version preference to 16 > 1424295899017 addons.xpi DEBUG Notifying XPI shutdown observers > 1424295899025 addons.manager DEBUG Async provider shutdown done > 1424295900455 addons.manager DEBUG Loaded provider scope for > resource://gre/modules/addons/XPIProvider.jsm: ["XPIProvider"] > 1424295900459 addons.manager DEBUG Loaded provider scope for > resource://gre/modules/LightweightThemeManager.jsm: > ["LightweightThemeManager"] > 1424295900468 addons.xpi DEBUG startup > 1424295900470 addons.xpi INFO Mapping [email protected] to / > var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected] > 1424295900471 addons.xpi DEBUG Ignoring file entry whose name is not a > valid add-on ID: > /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging > 1424295900472 addons.xpi INFO Mapping > {972ce4c6-7e08-4474-a285-3208198ce6fd} to > /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd} > 1424295900473 addons.xpi DEBUG Skipping unavailable install location > app-system-share > 1424295900475 addons.xpi DEBUG checkForChanges > 1424295900476 addons.xpi DEBUG Loaded add-on state from prefs: > {"app-profile":{"[email protected]":{"d":"/ > var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected] > ","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}} > 1424295900480 addons.xpi DEBUG getModTime: Recursive scan of > {972ce4c6-7e08-4474-a285-3208198ce6fd} > 1424295900483 addons.xpi DEBUG getInstallState changed: false, state: > {"app-profile":{"[email protected]":{"d":"/ > var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected] > ","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}} > 1424295900488 addons.xpi DEBUG No changes found > 1424295900502 addons.manager DEBUG Registering shutdown blocker for > XPIProvider > 1424295900504 addons.manager DEBUG Registering shutdown blocker for > LightweightThemeManager > 1424295900507 addons.manager DEBUG Registering shutdown blocker for > OpenH264Provider > 1424295900508 addons.manager DEBUG Registering shutdown blocker for > PluginProvider > *** Blocklist::_preloadBlocklistFile: blocklist is disabled > 1424295903113 addons.manager DEBUG Registering shutdown blocker for > <unnamed-provider> > > at > org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118) > at > org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246) > at > org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114) > at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191) > at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186) > at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182) > at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95) > at > org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53) > at > org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199) > at > org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161) > at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56) > at > org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101) > at > org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151) > at > org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722) > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=1 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=1 > -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0, > fetchQueues.getQueueCount=1 > > On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <[email protected]> wrote: > > Hi, > > When you install the patch, did you see any fails? No fail is tolerated. I > am guessing there is something wrong with ivy.xml. I am suggesting that > checkout ALL > files in Nutch and then try it again. > > Best, > Jiaxin > > On Tuesday, February 17, 2015, Jaydeep Bagrecha <[email protected]> wrote: > >> Hi all, >> I am trying to install and build selenium with nutch1.10 on Mac Yosemite. >> >> having following error after downloading selenium patch( >> https://issues.apache.org/jira/browse/NUTCH-1933) and while using “ant >> runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it? >> >> error: package org.openqa.selenium does not exist >> [javac] import org.openqa.selenium.By >> <http://org.openqa.selenium.by/>; >> [javac] ^ >> error: package org.openqa.selenium does not exist >> [javac] import org.openqa.selenium.WebDriver; >> [javac] ^ >> error: package org.openqa.selenium.firefox does not exist >> [javac] import org.openqa.selenium.firefox.FirefoxDriver; >> [javac] ^ >> error: package org.openqa.selenium.firefox does not exist >> [javac] import org.openqa.selenium.firefox.FirefoxProfile; >> error: cannot find symbol >> [javac] public static ThreadLocal<WebDriver> threadWebDriver = new >> ThreadLocal<WebDriver>() { >> [javac] ^ >> [javac] symbol: class WebDriver >> [javac] location: class HttpWebClient >> error: cannot find symbol >> [javac] protected WebDriver initialValue() >> [javac] ^ >> [javac] symbol: class WebDriver >> error: cannot find symbol >> [javac] FirefoxProfile profile = new FirefoxProfile(); >> [javac] ^ >> [javac] symbol: class FirefoxProfile >> error: cannot find symbol >> [javac] WebDriver driver = new FirefoxDriver(profile); >> [javac] ^ >> [javac] symbol: class FirefoxDriver >> error: cannot find symbol >> [javac] driver = new FirefoxDriver(); >> [javac] ^ >> [javac] symbol: class FirefoxDriver >> [javac] location: class HttpWebClient >> >> error: cannot find symbol >> [javac] new WebDriverWait(driver, 3); >> [javac] ^ >> [javac] symbol: class WebDriverWait >> [javac] location: class HttpWebClient >> >> error: cannot find symbol >> [javac] String innerHtml = >> driver.findElement(By.tagName("body")).getAttribute("innerHTML"); >> [javac] ^ >> [javac] symbol: variable By >> [javac] location: class HttpWebClient >> >> Thanks, >> Jaydeep >> >> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <[email protected]> wrote: >> >> Sure. I will do it once I confirm it works... >> >> On Thursday, February 12, 2015, Mattmann, Chris A (3980) < >> [email protected]> wrote: >> >>> This is great, Jiaxin, can you please make a wiki page on the Nutch >>> wiki that has this information? >>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Chris Mattmann, Ph.D. >>> Chief Architect >>> Instrument Software and Science Data Systems Section (398) >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> Office: 168-519, Mailstop: 168-527 >>> Email: [email protected] >>> WWW: http://sunset.usc.edu/~mattmann/ >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> Adjunct Associate Professor, Computer Science Department >>> University of Southern California, Los Angeles, CA 90089 USA >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >>> >>> >>> >>> >>> >>> -----Original Message----- >>> From: Jiaxin Ye <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Thursday, February 12, 2015 at 9:39 PM >>> To: "[email protected]" <[email protected]> >>> Subject: Nutch-Selenium in Nutch 1.10 >>> >>> >Hi Li, Shuo. You are so right. I finished installing and successfully >>> run >>> >the butch with selenium and Firefox. I have a question though, does your >>> >Firefox plug out for always all the urls we crawled? >>> > >>> > >>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC >>> with >>> >OS higher than 10.6 I think... >>> > >>> > >>> >1. Download XQuatz, it's a dmp file, install it directly >>> >2. Download Nutch 1.10 >>> >3. Download the patch and put it on the Nutch project directory >>> >4. patch -p0 < THE PATCH NAME >>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial >>> >in the github told you. The patch basically updated those .xml file for >>> >us. And the patch also installs lib-selenium and protocol selenium for >>> us >>> >(Correct me if >>> > I am wrong) >>> >6. Update tika dependency if needed >>> >7. Go to the Nutch project directory and run ant runtime >>> >8. Download Firefox >>> >9. Open a new terminal and type >>> > xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you >>> >want...) >>> > There should be some errors after entering the command (for me at >>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the >>> >mode to 1777. Rerun the command. xvfb should be working. >>> >10. Go to nutch > runtime > local and run the crawling command >>> > >>> > >>> >Hope it helps. :) >>> > >>> > >>> >Best, >>> >Jiaxin >>> > >>> > >>> > >>> > >>> > >>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li >>> ><[email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>> wrote: >>> > >>> >I think I have possibly finished installing. >>> > >>> > >>> >What you need to do: >>> >0. git status and checkout what you have modified. >>> >1. patch -p0 < YOUR_PATCH_FILE >>> >2. ant clean jar >>> >3. ant runtime >>> > >>> > >>> >Will try crawling using selenium later on. Hope this helped. >_< >>> > >>> > >>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980) >>> ><[email protected] >>> ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> > >>> >Yes I believe you need to install X11 - why don't you try and report >>> back >>> >what you find thanks. >>> > >>> >Sent from my iPhone >>> > >>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected] >>> ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> > >>> > >>> > >>> >Hi professor, but can we use Selenium on Mac? >>> > >>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980) >>> ><[email protected] >>> ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> > >>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the >>> >polar dataset you have been assigned in my CSCI 572 search engines >>> class. >>> > >>> >The instructions for integrating Selenium with Nutch 1.10-trunk >>> >are here: >>> > >>> >https://issues.apache.org/jira/browse/NUTCH-1933 >>> > >>> > >>> >Cheers, >>> >Chris >>> > >>> > >>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >Chris Mattmann, Ph.D. >>> >Chief Architect >>> >Instrument Software and Science Data Systems Section (398) >>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>> >Office: 168-519, Mailstop: 168-527 >>> >Email: [email protected] >>> >WWW: http://sunset.usc.edu/~mattmann/ >>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> >Adjunct Associate Professor, Computer Science Department >>> >University of Southern California, Los Angeles, CA 90089 USA >>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>> > >>> > >>> > >>> > >>> > >>> > >>> >-----Original Message----- >>> >From: Jiaxin Ye <[email protected]> >>> >Reply-To: "[email protected]" <[email protected]> >>> >Date: Thursday, February 12, 2015 at 12:46 AM >>> >To: "[email protected]" <[email protected]> >>> >Subject: Re: Nutch-Selenium in Nutch 1.10 >>> > >>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is >>> why >>> >>do we need Selenium anyway? Just easier to perform crawling? >>> >> >>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li >>> >><[email protected]> wrote: >>> >> >>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so >>> I'm >>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can >>> still >>> >>be installed properly. The issue would be I don't know how to integrate >>> >>Selenium with Nutch 1.10. >>> >> >>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye >>> >><[email protected]> wrote: >>> >> >>> >>Hi all, >>> >> >>> >> >>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac >>> ?? >>> >>I find it difficult to install Xvfb on mac. >>> >> >>> >> >>> >>Best, >>> >>Jiaxin >>> >> >>> >> >>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh >>> >><[email protected]> wrote: >>> >> >>> >>Hi Shuo Li, >>> >> >>> >> >>> >>We were facing a similar issue. Prof. Mattman suggested we look into >>> this >>> >>patch for Selenium on Nutch 1.10 : >>> >>https://issues.apache.org/jira/browse/NUTCH-1933. >>> >> >>> >> >>> >>Hope this helps! >>> >> >>> >> >>> >>Thanks, >>> >>Sapna >>> >> >>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li >>> >><[email protected]> wrote: >>> >> >>> >>Yop, >>> >> >>> >> >>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops >>> >>out: >>> >> >>> >> >>> >>error: package org.apache.nutch.storage does not exist >>> >> >>> >> >>> >> >>> >>I can only find this package in Nutch 2.x. Is there a way to use >>> Selenium >>> >>in 1.10? >>> >> >>> >> >>> >>Any advice would be appreciated. >>> >> >>> >> >>> >>Regards, >>> >>Shuo Li >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >>-- >>> >>Graduate Student >>> >>MS in CS (Data Science) >>> >>Viterbi School of Engineering >>> >>University of Southern California >>> >> >>> >> >>> >>Phone: >>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> >>> >> > >

