Re: Problem Fetching with Selenium Installed

Nagarjun Pola Thu, 19 Feb 2015 23:01:06 -0800

Thank You Mohammed.




I just got a fresh copy of Nutch and buit everything again from scratch and it 
seems to be fetching lot of data with Firefox popping every now and then.



Best,
Nagarjun Pola

On Thu, Feb 19, 2015 at 10:51 PM, Mohammad Al-Mohsin <[email protected]> wrote:

> Hi Nagarjun,
> I faced the same issue and got it resolved by deleting 'runtime' directory
> and then recompiling Nutch (along with Selenium plugin).
> So cd into nutch trunk or branch and then execute:
> rm -r runtime
> ant runtime
> Make sure you take a backup of your Nutch configurations before deleting
> runtime directory.
> Best regards,
> Mohammad Al-Mohsin
> On Thu, Feb 19, 2015 at 10:16 PM, Nagarjun Pola <[email protected]> wrote:
>> Yes. I should do that.
>>
>> Thank You Jiaxin.
>>
>> Best,
>> Nagarjun Pola
>>
>>
>> On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote:
>>
>>> Hmm...Why dont you try to git clone a new nutch and then use the nutch
>>> only to see if you can crawl or not?
>>>
>>> On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote:
>>>
>>>> Hmm I don't think the crawler is being blocked of politeness because I
>>>> am using the default Nutch configuration which is 1 request per second.
>>>>
>>>> And when I try to crawl with the sample URL by disabling Nutch plugin in
>>>> the Nutch-site.xml I can retrieve some links.
>>>>
>>>> The problem seems to be in the selenium plugin. Though Firefox pops
>>>> nothing is fetched.
>>>>
>>>> Best,
>>>> Nagarjun Pola
>>>>
>>>>
>>>> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote:
>>>>
>>>>> Hi, my teammate is also suffering from this situation now and I
>>>>> encountered this situation last night. But I am able to crawl now almost
>>>>> without doing anything. The reason I may guess is that your crawler is
>>>>> blocked by the website because not being polite. At least I believe that's
>>>>> the reason why I got the same *Could not initialize class
>>>>> org.apache.http.impl.conn.* last night. I don't how to solve it,
>>>>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it
>>>>> helps......
>>>>>
>>>>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote:
>>>>>
>>>>>>  I get the following error when tried with selenium. Firefox pops up
>>>>>> couple of times but fetches nothing.
>>>>>>
>>>>>> Can anyone help me on this issue?
>>>>>>
>>>>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1,
>>>>>> fetchQueues.getQueueCount=1*
>>>>>>
>>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>>>
>>>>>> *  maxThreads    = 1*
>>>>>>
>>>>>> *  inProgress    = 1*
>>>>>>
>>>>>> *  crawlDelay    = 5000*
>>>>>>
>>>>>> *  minCrawlDelay = 0*
>>>>>>
>>>>>> *  nextFetchTime = 1424410799976*
>>>>>>
>>>>>> *  now           = 1424410803146*
>>>>>>
>>>>>> *  0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>*
>>>>>>
>>>>>> *fetch of
>>>>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0
>>>>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0>
>>>>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class
>>>>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory*
>>>>>>
>>>>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1,
>>>>>> fetchQueues.getQueueCount=1*
>>>>>>
>>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>>>
>>>>>> *  maxThreads    = 1*
>>>>>>
>>>>>> *  inProgress    = 0*
>>>>>>
>>>>>> *  crawlDelay    = 5000*
>>>>>>
>>>>>> *  minCrawlDelay = 0*
>>>>>>
>>>>>> *  nextFetchTime = 1424410808305*
>>>>>>
>>>>>> *  now           = 1424410804147*
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: Problem Fetching with Selenium Installed

Reply via email to