Yes. I should do that.



Thank You Jiaxin.



Best,
Nagarjun Pola

On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote:

> Hmm...Why dont you try to git clone a new nutch and then use the nutch only
> to see if you can crawl or not?
> On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote:
>> Hmm I don't think the crawler is being blocked of politeness because I am
>> using the default Nutch configuration which is 1 request per second.
>>
>> And when I try to crawl with the sample URL by disabling Nutch plugin in
>> the Nutch-site.xml I can retrieve some links.
>>
>> The problem seems to be in the selenium plugin. Though Firefox pops
>> nothing is fetched.
>>
>> Best,
>> Nagarjun Pola
>>
>>
>> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote:
>>
>>> Hi, my teammate is also suffering from this situation now and I
>>> encountered this situation last night. But I am able to crawl now almost
>>> without doing anything. The reason I may guess is that your crawler is
>>> blocked by the website because not being polite. At least I believe that's
>>> the reason why I got the same *Could not initialize class
>>> org.apache.http.impl.conn.* last night. I don't how to solve it,
>>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it
>>> helps......
>>>
>>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote:
>>>
>>>>  I get the following error when tried with selenium. Firefox pops up
>>>> couple of times but fetches nothing.
>>>>
>>>> Can anyone help me on this issue?
>>>>
>>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1,
>>>> fetchQueues.getQueueCount=1*
>>>>
>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>
>>>> *  maxThreads    = 1*
>>>>
>>>> *  inProgress    = 1*
>>>>
>>>> *  crawlDelay    = 5000*
>>>>
>>>> *  minCrawlDelay = 0*
>>>>
>>>> *  nextFetchTime = 1424410799976*
>>>>
>>>> *  now           = 1424410803146*
>>>>
>>>> *  0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>*
>>>>
>>>> *fetch of
>>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0
>>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0>
>>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class
>>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory*
>>>>
>>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1,
>>>> fetchQueues.getQueueCount=1*
>>>>
>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>
>>>> *  maxThreads    = 1*
>>>>
>>>> *  inProgress    = 0*
>>>>
>>>> *  crawlDelay    = 5000*
>>>>
>>>> *  minCrawlDelay = 0*
>>>>
>>>> *  nextFetchTime = 1424410808305*
>>>>
>>>> *  now           = 1424410804147*
>>>>
>>>
>>>
>>

Reply via email to