Hi everyone,

I am using nutch 1.10 and try to use the interactive selenium plugin of the 
following link: 
https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium
 
<https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium>

And I try to crawl some websites that requires login. 

What I found is that when the website return a http response with code 403 in 
the first time, even if the interactive selenium process the website and got 
the new content after it login successfully, nutch still consider the  response 
code of 403 and would not fetch the page. 

When I go through the code of interactive selenium plugin, I found it didn’t 
update the http response status after got the new content. Is that something 
supposed to happen? Or do I miss some detail about using the plugin? 

Best,

Junpeng Luo

Reply via email to