I remember someone else mentioning the same problem in the class. So
probably you're not alone and it indeed is a bug.

On Thu, Oct 8, 2015 at 2:56 PM Junpeng Luo <[email protected]> wrote:

> Sorry I made a typo, actually I am using the 1.11-trunk. Thank you anyway!
>
> Junpeng Luo
>
> On Oct 8, 2015, at 2:45 PM, Mattmann, Chris A (3980) <
> [email protected]> wrote:
>
> You should be using nutch 1.11-trunk for your assignment
>
> Sent from my iPhone
>
> On Oct 8, 2015, at 1:55 PM, Junpeng Luo <[email protected]> wrote:
>
> Hi everyone,
>
> I am using nutch 1.10 and try to use the interactive selenium plugin of
> the following link:
>
> https://github.com/apache/nutch/tree/trunk/src/plugin/protocol-interactiveselenium
>
> And I try to crawl some websites that requires login.
>
> What I found is that when the website return a http response with code 403
> in the first time, even if the interactive selenium process the website and
> got the new content after it login successfully, nutch still consider the
>  response code of 403 and would not fetch the page.
>
> When I go through the code of interactive selenium plugin, I found it
> didn’t update the http response status after got the new content. Is that
> something supposed to happen? Or do I miss some detail about using the
> plugin?
>
> Best,
>
> Junpeng Luo
>
>
>

Reply via email to