Hi Nagarjun,

I faced the same issue and got it resolved by deleting 'runtime' directory
and then recompiling Nutch (along with Selenium plugin).

So cd into nutch trunk or branch and then execute:

rm -r runtime
ant runtime

Make sure you take a backup of your Nutch configurations before deleting
runtime directory.

Best regards,
Mohammad Al-Mohsin

On Thu, Feb 19, 2015 at 10:16 PM, Nagarjun Pola <[email protected]> wrote:

> Yes. I should do that.
>
> Thank You Jiaxin.
>
> Best,
> Nagarjun Pola
>
>
> On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote:
>
>> Hmm...Why dont you try to git clone a new nutch and then use the nutch
>> only to see if you can crawl or not?
>>
>> On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote:
>>
>>> Hmm I don't think the crawler is being blocked of politeness because I
>>> am using the default Nutch configuration which is 1 request per second.
>>>
>>> And when I try to crawl with the sample URL by disabling Nutch plugin in
>>> the Nutch-site.xml I can retrieve some links.
>>>
>>> The problem seems to be in the selenium plugin. Though Firefox pops
>>> nothing is fetched.
>>>
>>> Best,
>>> Nagarjun Pola
>>>
>>>
>>> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote:
>>>
>>>> Hi, my teammate is also suffering from this situation now and I
>>>> encountered this situation last night. But I am able to crawl now almost
>>>> without doing anything. The reason I may guess is that your crawler is
>>>> blocked by the website because not being polite. At least I believe that's
>>>> the reason why I got the same *Could not initialize class
>>>> org.apache.http.impl.conn.* last night. I don't how to solve it,
>>>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it
>>>> helps......
>>>>
>>>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote:
>>>>
>>>>>  I get the following error when tried with selenium. Firefox pops up
>>>>> couple of times but fetches nothing.
>>>>>
>>>>> Can anyone help me on this issue?
>>>>>
>>>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1,
>>>>> fetchQueues.getQueueCount=1*
>>>>>
>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>>
>>>>> *  maxThreads    = 1*
>>>>>
>>>>> *  inProgress    = 1*
>>>>>
>>>>> *  crawlDelay    = 5000*
>>>>>
>>>>> *  minCrawlDelay = 0*
>>>>>
>>>>> *  nextFetchTime = 1424410799976*
>>>>>
>>>>> *  now           = 1424410803146*
>>>>>
>>>>> *  0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>*
>>>>>
>>>>> *fetch of
>>>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0
>>>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0>
>>>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class
>>>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory*
>>>>>
>>>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1,
>>>>> fetchQueues.getQueueCount=1*
>>>>>
>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>>
>>>>> *  maxThreads    = 1*
>>>>>
>>>>> *  inProgress    = 0*
>>>>>
>>>>> *  crawlDelay    = 5000*
>>>>>
>>>>> *  minCrawlDelay = 0*
>>>>>
>>>>> *  nextFetchTime = 1424410808305*
>>>>>
>>>>> *  now           = 1424410804147*
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to