Sorry I made a typo, actually I am using the 1.11-trunk. Thank you anyway! 

Junpeng Luo

> On Oct 8, 2015, at 2:45 PM, Mattmann, Chris A (3980) 
> <[email protected]> wrote:
> 
> You should be using nutch 1.11-trunk for your assignment 
> 
> Sent from my iPhone
> 
> On Oct 8, 2015, at 1:55 PM, Junpeng Luo <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Hi everyone,
>> 
>> I am using nutch 1.10 and try to use the interactive selenium plugin of the 
>> following link: 
>> https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium
>>  
>> <https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium>
>> 
>> And I try to crawl some websites that requires login. 
>> 
>> What I found is that when the website return a http response with code 403 
>> in the first time, even if the interactive selenium process the website and 
>> got the new content after it login successfully, nutch still consider the  
>> response code of 403 and would not fetch the page. 
>> 
>> When I go through the code of interactive selenium plugin, I found it didn’t 
>> update the http response status after got the new content. Is that something 
>> supposed to happen? Or do I miss some detail about using the plugin? 
>> 
>> Best,
>> 
>> Junpeng Luo
>> 

Reply via email to