[ 
https://issues.apache.org/jira/browse/NUTCH-827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315858#comment-15315858
 ] 

Steve Yao commented on NUTCH-827:
---------------------------------

Seems this works until I got a cookies with a $Domain=".domain.com". The "." at 
the beginning cause the Cookie rejected:
{code}
WARN  httpclient.HttpMethodBase - Cookie rejected: "$Version=0; 
FORMCRED=itIsL35rD+qWVYeq3Gc7l5nWKAYx3Pz/YpsjEX86ftJlta; $Path=/; 
$Domain=.domain.com". Domain attribute ".domain.com" violates RFC 2109: host 
minus domain may not contain any dots
{code}
I think i can fix it with adding a CookiePolicy setting configuration item in 
to the login method. Thought?
{code:java}
public void login() throws Exception {
    // make sure cookies are turned on
    CookieHandler.setDefault(new CookieManager());
    // And the cookies policy could be changed here...
    String pageContent = httpGetPageContent(authConfigurer.getLoginUrl());
    List<NameValuePair> params = getLoginFormParams(pageContent);
    sendPost(authConfigurer.getLoginUrl(), params);
  }
{code}

> HTTP POST Authentication
> ------------------------
>
>                 Key: NUTCH-827
>                 URL: https://issues.apache.org/jira/browse/NUTCH-827
>             Project: Nutch
>          Issue Type: New Feature
>          Components: protocol
>    Affects Versions: 1.1, nutchgora
>            Reporter: Jasper van Veghel
>            Assignee: Lewis John McGibbney
>            Priority: Minor
>              Labels: authentication, memex
>             Fix For: 1.10
>
>         Attachments: NUTCH-827-trunk-v3.patch, NUTCH-827-trunk.patch, 
> NUTCH-827-trunkv2.patch, http-client-form-authtication.patch, 
> nutch-http-cookies.patch
>
>
> I've created a patch against the trunk which adds support for very 
> rudimentary POST-based authentication support. It takes a link from 
> nutch-site.xml with a site to POST to and its respective parameters 
> (username, password, etc.). It then checks upon every request whether any 
> cookies have been initialized, and if none have, it fetches them from the 
> given link.
> This isn't perfect but Works For Me (TM) as I generally only need to retrieve 
> results from a single domain and so have no cookie overlap (i.e. if the 
> domain cookies expire, all cookies disappear from the HttpClient and I can 
> simply re-fetch them). A natural improvement would be to be able to specify 
> one particular cookie to check the expiration-date against. If anyone is 
> interested in this beside me I'd be glad to put some more effort into making 
> this more universally applicable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to