Yes. I should do that.
Thank You Jiaxin. Best, Nagarjun Pola On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote: > Hmm...Why dont you try to git clone a new nutch and then use the nutch only > to see if you can crawl or not? > On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote: >> Hmm I don't think the crawler is being blocked of politeness because I am >> using the default Nutch configuration which is 1 request per second. >> >> And when I try to crawl with the sample URL by disabling Nutch plugin in >> the Nutch-site.xml I can retrieve some links. >> >> The problem seems to be in the selenium plugin. Though Firefox pops >> nothing is fetched. >> >> Best, >> Nagarjun Pola >> >> >> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote: >> >>> Hi, my teammate is also suffering from this situation now and I >>> encountered this situation last night. But I am able to crawl now almost >>> without doing anything. The reason I may guess is that your crawler is >>> blocked by the website because not being polite. At least I believe that's >>> the reason why I got the same *Could not initialize class >>> org.apache.http.impl.conn.* last night. I don't how to solve it, >>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it >>> helps...... >>> >>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote: >>> >>>> I get the following error when tried with selenium. Firefox pops up >>>> couple of times but fetches nothing. >>>> >>>> Can anyone help me on this issue? >>>> >>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1, >>>> fetchQueues.getQueueCount=1* >>>> >>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>* >>>> >>>> * maxThreads = 1* >>>> >>>> * inProgress = 1* >>>> >>>> * crawlDelay = 5000* >>>> >>>> * minCrawlDelay = 0* >>>> >>>> * nextFetchTime = 1424410799976* >>>> >>>> * now = 1424410803146* >>>> >>>> * 0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>* >>>> >>>> *fetch of >>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0 >>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0> >>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class >>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory* >>>> >>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1, >>>> fetchQueues.getQueueCount=1* >>>> >>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>* >>>> >>>> * maxThreads = 1* >>>> >>>> * inProgress = 0* >>>> >>>> * crawlDelay = 5000* >>>> >>>> * minCrawlDelay = 0* >>>> >>>> * nextFetchTime = 1424410808305* >>>> >>>> * now = 1424410804147* >>>> >>> >>> >>

