[
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-827:
---------------------------------------
Attachment: NUTCH-827-trunkv2.patch
Updated patch for trunk which takes on [~wastl-nagel]'s comments.
* I've moved the additions to httpclient-auth.xml to
httpclient-auth.xml.template
* I've also added some primative checking for form 'name' if we cannot locate
an 'id'
{code}
Element loginform = doc.getElementById(authConfigurer.getLoginFormId());
if (loginform == null) {
LOGGER.debug("'id' attribute for form element is null, trying 'name'.");
loginform = doc.select("form.answer[name="+
authConfigurer.getLoginFormId() + "]").first();
if (loginform == null) {
LOGGER.debug("'name' attribute for form element is also null.");
throw new IllegalArgumentException("No form exists: "
+ authConfigurer.getLoginFormId());
}
}
{code}
The rest seem to be OK to me and I am able to use this patch to fetch content
from secure databases.
> HTTP POST Authentication
> ------------------------
>
> Key: NUTCH-827
> URL: https://issues.apache.org/jira/browse/NUTCH-827
> Project: Nutch
> Issue Type: New Feature
> Components: protocol
> Affects Versions: 1.1, nutchgora
> Reporter: Jasper van Veghel
> Assignee: Lewis John McGibbney
> Priority: Minor
> Labels: authentication
> Fix For: 2.4, 1.10
>
> Attachments: NUTCH-827-trunk.patch, NUTCH-827-trunkv2.patch,
> http-client-form-authtication.patch, nutch-http-cookies.patch
>
>
> I've created a patch against the trunk which adds support for very
> rudimentary POST-based authentication support. It takes a link from
> nutch-site.xml with a site to POST to and its respective parameters
> (username, password, etc.). It then checks upon every request whether any
> cookies have been initialized, and if none have, it fetches them from the
> given link.
> This isn't perfect but Works For Me (TM) as I generally only need to retrieve
> results from a single domain and so have no cookie overlap (i.e. if the
> domain cookies expire, all cookies disappear from the HttpClient and I can
> simply re-fetch them). A natural improvement would be to be able to specify
> one particular cookie to check the expiration-date against. If anyone is
> interested in this beside me I'd be glad to put some more effort into making
> this more universally applicable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)