Hello, I used nutch 1.11 to crawl pages behind a login page. The http-auth configuration looked like this:
--------------------------------------------------------------------------- <?xml version="1.0"?> <auth-configuration> <credentials authMethod="formAuth" loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&locale_id=1&return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&timestamp=1464019963> loginFormId="loginForm" loginRedirect="true"> <loginPostData> <field name="user[email]" value="username"/> <field name="user[password]" value="password"/> </loginPostData> <additionalPostHeaders> </additionalPostHeaders> </credentials> </auth-configuration> -------------------------------------------------------------------- Everything worked fine. Then I updated to 1.13 (I also tried 1.18) and changed the configuration as described in the http-auth.xml file: ----------------------------------------------------------------------------- <auth-configuration> <credentials authMethod="formAuth" loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&locale_id=1&return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&timestamp=1464019963> loginFormId="loginForm" loginRedirect="true"> <loginPostData> <field name="user[email]" value="username"/> <field name="user[password]" value="password"/> </loginPostData> <additionalPostHeaders> </additionalPostHeaders> <removedFormFields> </removedFormFields> <loginCookie> <policy>BROWSER_COMPATIBILITY</policy> </loginCookie> </credentials> </auth-configuration> ----------------------------------------------- Now, the login did not work anymore. After some redirects, it gives an HTML response 403. I tried all loginCookie policy entries, but nothing worked. The login is to a Zendesk support system with Atlassian Crowd as a login provider. Has anything changed between 1.11 and 1.13 is something more strict than before? I found a very similar question in this mailing list (https://www.mail-archive.com/user@nutch.apache.org/msg15746.htmlfrom ) from 2017, which has no solutions. I would appreciate any help! Best regards Michael Dr. Michael Fritsch Technical Editor T: +49.40.325587.214 E: michael.frit...@coremedia.com<mailto:michael.frit...@coremedia.com> CoreMedia GmbH - Be iconic Ludwig-Erhard-Str. 18 20459 Hamburg, Germany www.coremedia.com<http://www.coremedia.com/> ------------------------------------------------------------ Managing Directory: Sören Stamer Commercial Register: Amtsgericht Hamburg, HR B 162480 ---------------------------------------------------------------------- Stay up to date and follow us on LinkedIn<https://www.linkedin.com/company/coremedia-corp> or Twitter<https://twitter.com/contentcloud>