Hi Nagarijun, I'm really confused that since we are using virtual framebuffer, why would Firefox pop out? I started Xvfb, then opened firefox in terminal and nothing popped out, but firefox is running.
On Thu, Feb 19, 2015 at 10:58 PM, Nagarjun Pola <[email protected]> wrote: > Thank You Mohammed. > > I just got a fresh copy of Nutch and buit everything again from scratch > and it seems to be fetching lot of data with Firefox popping every now and > then. > > Best, > Nagarjun Pola > > > On Thu, Feb 19, 2015 at 10:51 PM, Mohammad Al-Mohsin <[email protected]> wrote: > >> Hi Nagarjun, >> >> I faced the same issue and got it resolved by deleting 'runtime' >> directory and then recompiling Nutch (along with Selenium plugin). >> >> So cd into nutch trunk or branch and then execute: >> >> rm -r runtime >> ant runtime >> >> Make sure you take a backup of your Nutch configurations before deleting >> runtime directory. >> >> Best regards, >> Mohammad Al-Mohsin >> >> On Thu, Feb 19, 2015 at 10:16 PM, Nagarjun Pola <[email protected]> wrote: >> >>> Yes. I should do that. >>> >>> Thank You Jiaxin. >>> >>> Best, >>> Nagarjun Pola >>> >>> >>> On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote: >>> >>>> Hmm...Why dont you try to git clone a new nutch and then use the nutch >>>> only to see if you can crawl or not? >>>> >>>> On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote: >>>> >>>>> Hmm I don't think the crawler is being blocked of politeness because I >>>>> am using the default Nutch configuration which is 1 request per second. >>>>> >>>>> And when I try to crawl with the sample URL by disabling Nutch plugin >>>>> in the Nutch-site.xml I can retrieve some links. >>>>> >>>>> The problem seems to be in the selenium plugin. Though Firefox pops >>>>> nothing is fetched. >>>>> >>>>> Best, >>>>> Nagarjun Pola >>>>> >>>>> >>>>> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote: >>>>> >>>>>> Hi, my teammate is also suffering from this situation now and I >>>>>> encountered this situation last night. But I am able to crawl now almost >>>>>> without doing anything. The reason I may guess is that your crawler is >>>>>> blocked by the website because not being polite. At least I believe >>>>>> that's >>>>>> the reason why I got the same *Could not initialize class >>>>>> org.apache.http.impl.conn.* last night. I don't how to solve it, >>>>>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it >>>>>> helps...... >>>>>> >>>>>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote: >>>>>> >>>>>>> I get the following error when tried with selenium. Firefox pops >>>>>>> up couple of times but fetches nothing. >>>>>>> >>>>>>> Can anyone help me on this issue? >>>>>>> >>>>>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1, >>>>>>> fetchQueues.getQueueCount=1* >>>>>>> >>>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>* >>>>>>> >>>>>>> * maxThreads = 1* >>>>>>> >>>>>>> * inProgress = 1* >>>>>>> >>>>>>> * crawlDelay = 5000* >>>>>>> >>>>>>> * minCrawlDelay = 0* >>>>>>> >>>>>>> * nextFetchTime = 1424410799976* >>>>>>> >>>>>>> * now = 1424410803146* >>>>>>> >>>>>>> * 0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>* >>>>>>> >>>>>>> *fetch of >>>>>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0 >>>>>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0> >>>>>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class >>>>>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory* >>>>>>> >>>>>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1, >>>>>>> fetchQueues.getQueueCount=1* >>>>>>> >>>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>* >>>>>>> >>>>>>> * maxThreads = 1* >>>>>>> >>>>>>> * inProgress = 0* >>>>>>> >>>>>>> * crawlDelay = 5000* >>>>>>> >>>>>>> * minCrawlDelay = 0* >>>>>>> >>>>>>> * nextFetchTime = 1424410808305* >>>>>>> >>>>>>> * now = 1424410804147* >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >

