Update:

if xvfb -screen scrn 1024x758x34 doesn't work
try xvfb :11 -screen 0 1024x768x24


On Thu, Feb 19, 2015 at 1:25 AM, Jaydeep Bagrecha <[email protected]> wrote:

> Update:
>
>  selenium latest version 2.44.0 doesn’t seem to work with firefox latest
> version(35),so I installed firefox version 29 and it’s crawling properly
> now.
>
> On Feb 18, 2015, at 2:56 PM, Jaydeep Bagrecha <[email protected]> wrote:
>
> thanks Jiaxin!
>
> I again repeated the entire installation procedure and I think i have
> installed it correctly.(it said BUILD SUCCESSFUL after ant runtime command
> and has selenium jar files in runtime/local/lib folder)
>
> *When i started crawling the mozilla browser popped 2 times,but when i saw
> crawl statistics,it had fetched no urls(*Did anyone have this problem?)
>
> I had following error while crawling:-
>
> *org.openqa.selenium.firefox.NotConnectedException: Unable to connect to
> host 127.0.0.1 on port 7055 after 45000 ms. Firefox console output:*
> *h changes to installed add-ons*
> 1424295898279 addons.xpi-utils DEBUG Updating add-on states
> 1424295898281 addons.xpi-utils DEBUG Writing add-ons list
> 1424295898291 addons.manager DEBUG Registering shutdown blocker for
> XPIProvider
> 1424295898292 addons.manager DEBUG Registering shutdown blocker for
> LightweightThemeManager
> 1424295898295 addons.manager DEBUG Registering shutdown blocker for
> OpenH264Provider
> 1424295898296 addons.manager DEBUG Registering shutdown blocker for
> PluginProvider
> 1424295898775 DeferredSave.extensions.json DEBUG Starting timer
> 1424295898800 DeferredSave.extensions.json DEBUG Starting write
> 1424295898858 addons.manager DEBUG shutdown
> 1424295898859 addons.manager DEBUG Calling shutdown blocker for
> XPIProvider
> 1424295898859 addons.xpi DEBUG shutdown
> 1424295898860 addons.xpi-utils DEBUG shutdown
> 1424295898861 addons.manager DEBUG Calling shutdown blocker for
> LightweightThemeManager
> 1424295898862 addons.manager DEBUG Calling shutdown blocker for
> OpenH264Provider
> 1424295898864 addons.manager DEBUG Calling shutdown blocker for
> PluginProvider
> 1424295899016 DeferredSave.extensions.json DEBUG Write succeeded
> 1424295899016 addons.xpi-utils DEBUG XPI Database saved, setting schema
> version preference to 16
> 1424295899017 addons.xpi DEBUG Notifying XPI shutdown observers
> 1424295899025 addons.manager DEBUG Async provider shutdown done
> 1424295900455 addons.manager DEBUG Loaded provider scope for
> resource://gre/modules/addons/XPIProvider.jsm: ["XPIProvider"]
> 1424295900459 addons.manager DEBUG Loaded provider scope for
> resource://gre/modules/LightweightThemeManager.jsm:
> ["LightweightThemeManager"]
> 1424295900468 addons.xpi DEBUG startup
> 1424295900470 addons.xpi INFO Mapping [email protected] to /
> var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]
> 1424295900471 addons.xpi DEBUG Ignoring file entry whose name is not a
> valid add-on ID:
> /var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/webdriver-staging
> 1424295900472 addons.xpi INFO Mapping
> {972ce4c6-7e08-4474-a285-3208198ce6fd} to
> /Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900473 addons.xpi DEBUG Skipping unavailable install location
> app-system-share
> 1424295900475 addons.xpi DEBUG checkForChanges
> 1424295900476 addons.xpi DEBUG Loaded add-on state from prefs:
> {"app-profile":{"[email protected]":{"d":"/
> var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]
> ","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900480 addons.xpi DEBUG getModTime: Recursive scan of
> {972ce4c6-7e08-4474-a285-3208198ce6fd}
> 1424295900483 addons.xpi DEBUG getInstallState changed: false, state:
> {"app-profile":{"[email protected]":{"d":"/
> var/folders/np/stzpy0s56v719zgrt_gsgzf40000gn/T/anonymous3766188187771514178webdriver-profile/extensions/[email protected]
> ","e":false,"v":"2.42.2","st":1424295897000,"mt":1424295897000}},"app-global":{"{972ce4c6-7e08-4474-a285-3208198ce6fd}":{"d":"/Applications/Firefox.app/Contents/Resources/browser/extensions/{972ce4c6-7e08-4474-a285-3208198ce6fd}","e":true,"v":"35.0.1","st":1423704245000,"mt":1423704244000}}}
> 1424295900488 addons.xpi DEBUG No changes found
> 1424295900502 addons.manager DEBUG Registering shutdown blocker for
> XPIProvider
> 1424295900504 addons.manager DEBUG Registering shutdown blocker for
> LightweightThemeManager
> 1424295900507 addons.manager DEBUG Registering shutdown blocker for
> OpenH264Provider
> 1424295900508 addons.manager DEBUG Registering shutdown blocker for
> PluginProvider
> *** Blocklist::_preloadBlocklistFile: blocklist is disabled
> 1424295903113 addons.manager DEBUG Registering shutdown blocker for
> <unnamed-provider>
>
> at
> org.openqa.selenium.firefox.internal.NewProfileExtensionConnection.start(NewProfileExtensionConnection.java:118)
> at
> org.openqa.selenium.firefox.FirefoxDriver.startClient(FirefoxDriver.java:246)
> at
> org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:114)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:191)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:186)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:182)
> at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:95)
> at
> org.apache.nutch.protocol.selenium.HttpWebClient.getHtmlPage(HttpWebClient.java:53)
> at
> org.apache.nutch.protocol.selenium.HttpResponse.readPlainContent(HttpResponse.java:199)
> at
> org.apache.nutch.protocol.selenium.HttpResponse.<init>(HttpResponse.java:161)
> at org.apache.nutch.protocol.selenium.Http.getResponse(Http.java:56)
> at
> org.apache.nutch.protocol.http.api.HttpRobotRulesParser.getRobotRulesSet(HttpRobotRulesParser.java:101)
> at
> org.apache.nutch.protocol.RobotRulesParser.getRobotRulesSet(RobotRulesParser.java:151)
> at
> org.apache.nutch.protocol.http.api.HttpBase.getRobotRules(HttpBase.java:492)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:722)
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0,
> fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0,
> fetchQueues.getQueueCount=1
> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0,
> fetchQueues.getQueueCount=1
>
> On Feb 17, 2015, at 11:21 PM, Jiaxin Ye <[email protected]> wrote:
>
> Hi,
>
> When you install the patch, did you see any fails? No fail is tolerated. I
> am guessing there is something wrong with ivy.xml. I am suggesting that 
> checkout ALL
> files in Nutch and then try it again.
>
> Best,
> Jiaxin
>
> On Tuesday, February 17, 2015, Jaydeep Bagrecha <[email protected]> wrote:
>
>> Hi all,
>> I am trying to install and build selenium with nutch1.10 on Mac Yosemite.
>>
>>  having following error after downloading selenium patch(
>> https://issues.apache.org/jira/browse/NUTCH-1933) and while using “ant
>> runtime” command (as mentioned by Jiaxin below).Any suggestions to avoid it?
>>
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.By
>> <http://org.openqa.selenium.by/>;
>>     [javac]                           ^
>>  error: package org.openqa.selenium does not exist
>>     [javac] import org.openqa.selenium.WebDriver;
>>     [javac]                           ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxDriver;
>>     [javac]                                   ^
>>  error: package org.openqa.selenium.firefox does not exist
>>     [javac] import org.openqa.selenium.firefox.FirefoxProfile;
>> error: cannot find symbol
>>     [javac]   public static ThreadLocal<WebDriver> threadWebDriver = new
>> ThreadLocal<WebDriver>() {
>>     [javac]                             ^
>>     [javac]   symbol:   class WebDriver
>>     [javac]   location: class HttpWebClient
>>  error: cannot find symbol
>>     [javac]     protected WebDriver initialValue()
>>     [javac]               ^
>>     [javac]   symbol: class WebDriver
>>  error: cannot find symbol
>>     [javac]       FirefoxProfile profile = new FirefoxProfile();
>>     [javac]       ^
>>     [javac]   symbol: class FirefoxProfile
>> error: cannot find symbol
>>     [javac]       WebDriver driver = new FirefoxDriver(profile);
>>     [javac]                              ^
>>     [javac]   symbol: class FirefoxDriver
>>  error: cannot find symbol
>>     [javac]       driver = new FirefoxDriver();
>>     [javac]                    ^
>>     [javac]   symbol:   class FirefoxDriver
>>     [javac]   location: class HttpWebClient
>>
>>  error: cannot find symbol
>>     [javac]       new WebDriverWait(driver, 3);
>>     [javac]           ^
>>     [javac]   symbol:   class WebDriverWait
>>     [javac]   location: class HttpWebClient
>>
>>  error: cannot find symbol
>>     [javac]       String innerHtml =
>> driver.findElement(By.tagName("body")).getAttribute("innerHTML");
>>     [javac]                                             ^
>>     [javac]   symbol:   variable By
>>     [javac]   location: class HttpWebClient
>>
>> Thanks,
>> Jaydeep
>>
>> On Feb 12, 2015, at 11:37 PM, Jiaxin Ye <[email protected]> wrote:
>>
>> Sure. I will do it once I confirm it works...
>>
>> On Thursday, February 12, 2015, Mattmann, Chris A (3980) <
>> [email protected]> wrote:
>>
>>> This is great, Jiaxin, can you please make a wiki page on the Nutch
>>> wiki that has this information?
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398)
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: [email protected]
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Associate Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Jiaxin Ye <[email protected]>
>>> Reply-To: "[email protected]" <[email protected]>
>>> Date: Thursday, February 12, 2015 at 9:39 PM
>>> To: "[email protected]" <[email protected]>
>>> Subject: Nutch-Selenium in Nutch 1.10
>>>
>>> >Hi Li, Shuo. You are so right. I finished installing and successfully
>>> run
>>> >the butch with selenium and Firefox. I have a question though, does your
>>> >Firefox plug out for always all the urls we crawled?
>>> >
>>> >
>>> >Hi Prof Mattmann. I think here is the way we install selenium on MAC
>>> with
>>> >OS higher than 10.6 I think...
>>> >
>>> >
>>> >1. Download XQuatz, it's a dmp file, install it directly
>>> >2. Download Nutch 1.10
>>> >3. Download the patch and put it on the Nutch project directory
>>> >4. patch -p0 < THE PATCH NAME
>>> >5. DO NOT update the build.xml and the ivy.xml as the selenium tutorial
>>> >in the github told you. The patch basically updated those .xml file for
>>> >us. And the patch also installs lib-selenium and protocol selenium for
>>> us
>>> >(Correct me if
>>> > I am wrong)
>>> >6. Update tika dependency if needed
>>> >7. Go to the Nutch project directory and run ant runtime
>>> >8. Download Firefox
>>> >9. Open a new terminal and type
>>> >    xvfb -screen scrn 1024x758x34 (I think you can set it smaller if you
>>> >want...)
>>> >    There should be some errors after entering the command (for me at
>>> >least). Manually sudo create a /tmp/.X11-unix folder, and then set the
>>> >mode to 1777. Rerun the command. xvfb should be working.
>>> >10. Go to nutch > runtime > local and run the crawling command
>>> >
>>> >
>>> >Hope it helps. :)
>>> >
>>> >
>>> >Best,
>>> >Jiaxin
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 1:08 PM, Shuo Li
>>> ><[email protected] <javascript:_e(%7B%7D,'cvml','[email protected]');>>
>>> wrote:
>>> >
>>> >I think I have possibly finished installing.
>>> >
>>> >
>>> >What you need to do:
>>> >0. git status and checkout what you have modified.
>>> >1. patch -p0 < YOUR_PATCH_FILE
>>> >2. ant clean jar
>>> >3. ant runtime
>>> >
>>> >
>>> >Will try crawling using selenium later on. Hope this helped. >_<
>>> >
>>> >
>>> >On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A (3980)
>>> ><[email protected]
>>> ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>> >
>>> >Yes I believe you need to install X11 - why don't you try and report
>>> back
>>> >what you find thanks.
>>> >
>>> >Sent from my iPhone
>>> >
>>> >On Feb 12, 2015, at 8:28 AM, Jiaxin Ye <[email protected]
>>> ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>> >
>>> >
>>> >
>>> >Hi professor, but can we use Selenium on Mac?
>>> >
>>> >On Thursday, February 12, 2015, Mattmann, Chris A (3980)
>>> ><[email protected]
>>> ><javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>> >
>>> >You need Selenium Jiaxin, in order to crawl dynamic pages in the
>>> >polar dataset you have been assigned in my CSCI 572 search engines
>>> class.
>>> >
>>> >The instructions for integrating Selenium with Nutch 1.10-trunk
>>> >are here:
>>> >
>>> >https://issues.apache.org/jira/browse/NUTCH-1933
>>> >
>>> >
>>> >Cheers,
>>> >Chris
>>> >
>>> >
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Chris Mattmann, Ph.D.
>>> >Chief Architect
>>> >Instrument Software and Science Data Systems Section (398)
>>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> >Office: 168-519, Mailstop: 168-527
>>> >Email: [email protected]
>>> >WWW:  http://sunset.usc.edu/~mattmann/
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >Adjunct Associate Professor, Computer Science Department
>>> >University of Southern California, Los Angeles, CA 90089 USA
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >-----Original Message-----
>>> >From: Jiaxin Ye <[email protected]>
>>> >Reply-To: "[email protected]" <[email protected]>
>>> >Date: Thursday, February 12, 2015 at 12:46 AM
>>> >To: "[email protected]" <[email protected]>
>>> >Subject: Re: Nutch-Selenium in Nutch 1.10
>>> >
>>> >>Well, good choice. I am thinking changing to ubuntu now. The thing is
>>> why
>>> >>do we need Selenium anyway? Just easier to perform crawling?
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:25 AM, Shuo Li
>>> >><[email protected]> wrote:
>>> >>
>>> >>Interestingly, I'm a mac user but I don't want to screw my laptop so
>>> I'm
>>> >>using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can
>>> still
>>> >>be installed properly. The issue would be I don't know how to integrate
>>> >>Selenium with Nutch 1.10.
>>> >>
>>> >>On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
>>> >><[email protected]> wrote:
>>> >>
>>> >>Hi all,
>>> >>
>>> >>
>>> >>Anyone here knows where to find the setup tutorial for Selenium on Mac
>>> ??
>>> >>I find it difficult to install Xvfb on mac.
>>> >>
>>> >>
>>> >>Best,
>>> >>Jiaxin
>>> >>
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh
>>> >><[email protected]> wrote:
>>> >>
>>> >>Hi Shuo Li,
>>> >>
>>> >>
>>> >>We were facing a similar issue. Prof. Mattman suggested we look into
>>> this
>>> >>patch for Selenium on Nutch 1.10 :
>>> >>https://issues.apache.org/jira/browse/NUTCH-1933.
>>> >>
>>> >>
>>> >>Hope this helps!
>>> >>
>>> >>
>>> >>Thanks,
>>> >>Sapna
>>> >>
>>> >>On Tue, Feb 10, 2015 at 9:36 PM, Shuo Li
>>> >><[email protected]> wrote:
>>> >>
>>> >>Yop,
>>> >>
>>> >>
>>> >>I'm trying to install selenium in Nutch 1.10. However, this error pops
>>> >>out:
>>> >>
>>> >>
>>> >>error: package org.apache.nutch.storage does not exist
>>> >>
>>> >>
>>> >>
>>> >>I can only find this package in Nutch 2.x. Is there a way to use
>>> Selenium
>>> >>in 1.10?
>>> >>
>>> >>
>>> >>Any advice would be appreciated.
>>> >>
>>> >>
>>> >>Regards,
>>> >>Shuo Li
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>--
>>> >>Graduate Student
>>> >>MS in CS (Data Science)
>>> >>Viterbi School of Engineering
>>> >>University of Southern California
>>> >>
>>> >>
>>> >>Phone:
>>> >>+1 650-307-9848 <tel:%2B1%20650-307-9848> <tel:%2B1%20650-307-9848>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>
>
>

Reply via email to