Re: Interactive selenium plugin issue

Mattmann, Chris A (3980) Thu, 08 Oct 2015 14:46:18 -0700

You should be using nutch 1.11-trunk for your assignment

Sent from my iPhone


On Oct 8, 2015, at 1:55 PM, Junpeng Luo 
<[email protected]<mailto:[email protected]>> wrote:

Hi everyone,

I am using nutch 1.10 and try to use the interactive selenium plugin of the 
following link:
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium

And I try to crawl some websites that requires login.

What I found is that when the website return a http response with code 403 in 
the first time, even if the interactive selenium process the website and got 
the new content after it login successfully, nutch still consider the  response 
code of 403 and would not fetch the page.

When I go through the code of interactive selenium plugin, I found it didn’t 
update the http response status after got the new content. Is that something 
supposed to happen? Or do I miss some detail about using the plugin?

Best,

Junpeng Luo

Re: Interactive selenium plugin issue

Reply via email to