Hello,

I used nutch 1.11 to crawl pages behind a login page.
The http-auth configuration looked like this:

---------------------------------------------------------------------------
<?xml version="1.0"?>
<auth-configuration>
  <credentials authMethod="formAuth"
               
loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&amp;locale_id=1&amp;return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&amp;timestamp=1464019963>
               loginFormId="loginForm"
               loginRedirect="true">
    <loginPostData>
      <field name="user[email]"
             value="username"/>
      <field name="user[password]"
             value="password"/>
    </loginPostData>
    <additionalPostHeaders>
    </additionalPostHeaders>
  </credentials>
</auth-configuration>
--------------------------------------------------------------------

Everything worked fine. Then I updated to 1.13 (I also tried 1.18) and changed 
the configuration as described in the http-auth.xml file:

-----------------------------------------------------------------------------

<auth-configuration>
  <credentials authMethod="formAuth"
               
loginUrl=loginURL<https://sso.coremedia.com/zendesk/index.jsp?brand_id=3187316&amp;locale_id=1&amp;return_to=https%3A%2F%2Fsupport.coremedia.com%2Fhc%2Fen-us&amp;timestamp=1464019963>
               loginFormId="loginForm"
               loginRedirect="true">
    <loginPostData>
      <field name="user[email]"
             value="username"/>
      <field name="user[password]"
             value="password"/>
    </loginPostData>
    <additionalPostHeaders>
    </additionalPostHeaders>
    <removedFormFields>
    </removedFormFields>
    <loginCookie>
      <policy>BROWSER_COMPATIBILITY</policy>
    </loginCookie>
  </credentials>

</auth-configuration>

-----------------------------------------------

Now, the login did not work anymore. After some redirects, it gives an HTML 
response 403. I tried all loginCookie policy entries, but nothing worked.
The login is to a Zendesk support system with Atlassian Crowd as a login 
provider. Has anything changed between 1.11 and 1.13 is something more strict 
than before?


I found a very similar question in this mailing list 
(https://www.mail-archive.com/user@nutch.apache.org/msg15746.htmlfrom ) from 
2017, which has no solutions.

I would appreciate any help!

Best regards

Michael


Dr. Michael Fritsch
Technical Editor

T: +49.40.325587.214
E: michael.frit...@coremedia.com<mailto:michael.frit...@coremedia.com>

CoreMedia GmbH - Be iconic
Ludwig-Erhard-Str. 18
20459 Hamburg, Germany
www.coremedia.com<http://www.coremedia.com/>
------------------------------------------------------------
Managing Directory: Sören Stamer
Commercial Register: Amtsgericht Hamburg, HR B 162480
----------------------------------------------------------------------
Stay up to date and follow us on 
LinkedIn<https://www.linkedin.com/company/coremedia-corp> or 
Twitter<https://twitter.com/contentcloud>

Reply via email to