Hi Nagarjun, I faced the same issue and got it resolved by deleting 'runtime' directory and then recompiling Nutch (along with Selenium plugin).
So cd into nutch trunk or branch and then execute: rm -r runtime ant runtime Make sure you take a backup of your Nutch configurations before deleting runtime directory. Best regards, Mohammad Al-Mohsin On Thu, Feb 19, 2015 at 10:16 PM, Nagarjun Pola <[email protected]> wrote: > Yes. I should do that. > > Thank You Jiaxin. > > Best, > Nagarjun Pola > > > On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote: > >> Hmm...Why dont you try to git clone a new nutch and then use the nutch >> only to see if you can crawl or not? >> >> On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote: >> >>> Hmm I don't think the crawler is being blocked of politeness because I >>> am using the default Nutch configuration which is 1 request per second. >>> >>> And when I try to crawl with the sample URL by disabling Nutch plugin in >>> the Nutch-site.xml I can retrieve some links. >>> >>> The problem seems to be in the selenium plugin. Though Firefox pops >>> nothing is fetched. >>> >>> Best, >>> Nagarjun Pola >>> >>> >>> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote: >>> >>>> Hi, my teammate is also suffering from this situation now and I >>>> encountered this situation last night. But I am able to crawl now almost >>>> without doing anything. The reason I may guess is that your crawler is >>>> blocked by the website because not being polite. At least I believe that's >>>> the reason why I got the same *Could not initialize class >>>> org.apache.http.impl.conn.* last night. I don't how to solve it, >>>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it >>>> helps...... >>>> >>>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote: >>>> >>>>> I get the following error when tried with selenium. Firefox pops up >>>>> couple of times but fetches nothing. >>>>> >>>>> Can anyone help me on this issue? >>>>> >>>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1, >>>>> fetchQueues.getQueueCount=1* >>>>> >>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>* >>>>> >>>>> * maxThreads = 1* >>>>> >>>>> * inProgress = 1* >>>>> >>>>> * crawlDelay = 5000* >>>>> >>>>> * minCrawlDelay = 0* >>>>> >>>>> * nextFetchTime = 1424410799976* >>>>> >>>>> * now = 1424410803146* >>>>> >>>>> * 0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>* >>>>> >>>>> *fetch of >>>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0 >>>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0> >>>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class >>>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory* >>>>> >>>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1, >>>>> fetchQueues.getQueueCount=1* >>>>> >>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>* >>>>> >>>>> * maxThreads = 1* >>>>> >>>>> * inProgress = 0* >>>>> >>>>> * crawlDelay = 5000* >>>>> >>>>> * minCrawlDelay = 0* >>>>> >>>>> * nextFetchTime = 1424410808305* >>>>> >>>>> * now = 1424410804147* >>>>> >>>> >>>> >>> >> >

