Definitely sounds like a bug to me. A patch would be super awesome ;)

-- Jimmy

On Thu, Oct 8, 2015 at 1:54 PM, Junpeng Luo <[email protected]> wrote:

> Hi everyone,
>
> I am using nutch 1.10 and try to use the interactive selenium plugin of
> the following link:
>
> https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium
>
> And I try to crawl some websites that requires login.
>
> What I found is that when the website return a http response with code 403
> in the first time, even if the interactive selenium process the website and
> got the new content after it login successfully, nutch still consider the
>  response code of 403 and would not fetch the page.
>
> When I go through the code of interactive selenium plugin, I found it
> didn’t update the http response status after got the new content. Is that
> something supposed to happen? Or do I miss some detail about using the
> plugin?
>
> Best,
>
> Junpeng Luo
>
>

Reply via email to