Re: Problem Fetching with Selenium Installed

Qing Liu Fri, 20 Feb 2015 14:30:56 -0800

Hi Nagarijun, I'm really confused that since we are using virtual
framebuffer, why would Firefox pop out? I started Xvfb, then opened firefox
in terminal and nothing popped out, but firefox is running.


On Thu, Feb 19, 2015 at 10:58 PM, Nagarjun Pola <[email protected]> wrote:

> Thank You Mohammed.
>
> I just got a fresh copy of Nutch and buit everything again from scratch
> and it seems to be fetching lot of data with Firefox popping every now and
> then.
>
> Best,
> Nagarjun Pola
>
>
> On Thu, Feb 19, 2015 at 10:51 PM, Mohammad Al-Mohsin <[email protected]> wrote:
>
>>  Hi Nagarjun,
>>
>> I faced the same issue and got it resolved by deleting 'runtime'
>> directory and then recompiling Nutch (along with Selenium plugin).
>>
>> So cd into nutch trunk or branch and then execute:
>>
>> rm -r runtime
>>  ant runtime
>>
>> Make sure you take a backup of your Nutch configurations before deleting
>> runtime directory.
>>
>> Best regards,
>> Mohammad Al-Mohsin
>>
>> On Thu, Feb 19, 2015 at 10:16 PM, Nagarjun Pola <[email protected]> wrote:
>>
>>> Yes. I should do that.
>>>
>>> Thank You Jiaxin.
>>>
>>> Best,
>>> Nagarjun Pola
>>>
>>>
>>> On Thu, Feb 19, 2015 at 10:15 PM, Jiaxin Ye <[email protected]> wrote:
>>>
>>>> Hmm...Why dont you try to git clone a new nutch and then use the nutch
>>>> only to see if you can crawl or not?
>>>>
>>>> On Thu, Feb 19, 2015 at 10:09 PM, Nagarjun Pola <[email protected]> wrote:
>>>>
>>>>> Hmm I don't think the crawler is being blocked of politeness because I
>>>>> am using the default Nutch configuration which is 1 request per second.
>>>>>
>>>>> And when I try to crawl with the sample URL by disabling Nutch plugin
>>>>> in the Nutch-site.xml I can retrieve some links.
>>>>>
>>>>> The problem seems to be in the selenium plugin. Though Firefox pops
>>>>> nothing is fetched.
>>>>>
>>>>> Best,
>>>>> Nagarjun Pola
>>>>>
>>>>>
>>>>> On Thu, Feb 19, 2015 at 10:05 PM, Jiaxin Ye <[email protected]> wrote:
>>>>>
>>>>>> Hi, my teammate is also suffering from this situation now and I
>>>>>> encountered this situation last night. But I am able to crawl now almost
>>>>>> without doing anything. The reason I may guess is that your crawler is
>>>>>> blocked by the website because not being polite. At least I believe 
>>>>>> that's
>>>>>> the reason why I got the same *Could not initialize class
>>>>>> org.apache.http.impl.conn.* last night. I don't how to solve it,
>>>>>> though..... Fortunate enough I think I am unbanned now, I guess? Hope it
>>>>>> helps......
>>>>>>
>>>>>> On Thu, Feb 19, 2015 at 9:44 PM, Nagarjun Pola <[email protected]> wrote:
>>>>>>
>>>>>>>  I get the following error when tried with selenium. Firefox pops
>>>>>>> up couple of times but fetches nothing.
>>>>>>>
>>>>>>> Can anyone help me on this issue?
>>>>>>>
>>>>>>> *-activeThreads=50, spinWaiting=49, fetchQueues.totalSize=1,
>>>>>>> fetchQueues.getQueueCount=1*
>>>>>>>
>>>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>>>>
>>>>>>> *  maxThreads    = 1*
>>>>>>>
>>>>>>> *  inProgress    = 1*
>>>>>>>
>>>>>>> *  crawlDelay    = 5000*
>>>>>>>
>>>>>>> *  minCrawlDelay = 0*
>>>>>>>
>>>>>>> *  nextFetchTime = 1424410799976*
>>>>>>>
>>>>>>> *  now           = 1424410803146*
>>>>>>>
>>>>>>> *  0. http://gcmd.gsfc.nasa.gov/ <http://gcmd.gsfc.nasa.gov/>*
>>>>>>>
>>>>>>> *fetch of
>>>>>>> http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0
>>>>>>> <http://gcmd.gsfc.nasa.gov/KeywordSearch/Home.do?Portal=amd&MetadataType=0>
>>>>>>> failed with: java.lang.NoClassDefFoundError: Could not initialize class
>>>>>>> org.apache.http.impl.conn.ManagedHttpClientConnectionFactory*
>>>>>>>
>>>>>>> *-activeThreads=50, spinWaiting=50, fetchQueues.totalSize=1,
>>>>>>> fetchQueues.getQueueCount=1*
>>>>>>>
>>>>>>> ** queue: http://gcmd.gsfc.nasa.gov <http://gcmd.gsfc.nasa.gov>*
>>>>>>>
>>>>>>> *  maxThreads    = 1*
>>>>>>>
>>>>>>> *  inProgress    = 0*
>>>>>>>
>>>>>>> *  crawlDelay    = 5000*
>>>>>>>
>>>>>>> *  minCrawlDelay = 0*
>>>>>>>
>>>>>>> *  nextFetchTime = 1424410808305*
>>>>>>>
>>>>>>> *  now           = 1424410804147*
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Problem Fetching with Selenium Installed

Reply via email to