[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464589#comment-13464589
]
Jasper van Veghel commented on NUTCH-827:
-----------------------------------------
Hey guys,
This has been some time back, but take a look at the patch:
nutch-default.xml ..
<name>http.cookie.login.page</name>
<description>URL of the login page to derive the cookies from. Cookies will be
stored upon initialization and re-initialized upon expiration. Any URL request
attributes will be [..] POSTed to the page. [..]</description>
Apologies for the poor grammar in the original. ;-) Basically:
- Whenever protocol-httpclient performs an HTTP request, it will first check if
there are cookies stored in the cookie jar.
- If there are cookies in the cookie jar AND none of the cookies have expired,
it will do nothing.
- If there are no cookies in the cookie jar OR at least one of the cookies has
expired, it will ..
- POST the URL / parameters provided in "http.cookie.login.page" property
- In the process of which, the cookie jar should get filled with the cookies
you need to perform subsequent (authenticated) requests
The "http.cookie.login.page" property could contain something like
"http://abc/def?username=foo&password=bar" .. the 'username' and 'password'
properties will them be POSTed to 'http://abc/def', which should result in
cookies being added to the cookie jar which is used for each subsequent request.
This isn't exactly a fool-proof solution (what if other requests generate
expired cookies? what if the login fails? etc.), but for the project for which
I wrote the patch, it suited our needs. Hope it helps!
> HTTP POST Authentication
> ------------------------
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: fetcher
> Affects Versions: 1.1, nutchgora
> Reporter: Jasper van Veghel
> Priority: Minor
> Labels: authentication
> Fix For: 1.6
>
> Attachments: nutch-http-cookies.patch
>
>
> I've created a patch against the trunk which adds support for very
> rudimentary POST-based authentication support. It takes a link from
> nutch-site.xml with a site to POST to and its respective parameters
> (username, password, etc.). It then checks upon every request whether any
> cookies have been initialized, and if none have, it fetches them from the
> given link.
> This isn't perfect but Works For Me (TM) as I generally only need to retrieve
> results from a single domain and so have no cookie overlap (i.e. if the
> domain cookies expire, all cookies disappear from the HttpClient and I can
> simply re-fetch them). A natural improvement would be to be able to specify
> one particular cookie to check the expiration-date against. If anyone is
> interested in this beside me I'd be glad to put some more effort into making
> this more universally applicable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira