[ http://issues.apache.org/jira/browse/NUTCH-28?page=comments#action_62131 
]
     
Doug Bakewell commented on NUTCH-28:
------------------------------------

I have tried the attached code. For simple https pages it worked fine. We have 
a few https pages which redirect to a login page. These pages gave the 
following exception but crawling continued and the resulting database gave 
results as expected, without the login page and without the redirect page. 
Also, I'm not sure if the first line of the log below is related.

I'm not sure if this is a real problem. Maybe it just needs to be dealt with 
somewhere to suppress the output.

050401 021440 Going to buffer response body of large or unknown size. Using 
getResponseAsStream instead is recommended.
050401 021440 Error getting URI host
org.apache.commons.httpclient.HttpException: Redirect from host demo.nfis.org 
to ca.nfis.org is not supported
        at 
org.apache.commons.httpclient.HttpMethodBase.checkValidRedirect(HttpMethodBase.java:1237)
        at 
org.apache.commons.httpclient.HttpMethodBase.processRedirectResponse(HttpMethodBase.java:1185)
        at 
org.apache.commons.httpclient.HttpMethodBase.isRetryNeeded(HttpMethodBase.java:967)
        at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1089)
        at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:643)
        at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:497)
        at net.nutch.protocol.https.HTTPS.getContent(HTTPS.java:22)
        at net.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:107)
050401 021440 Invalid Redirect URI from: 
https://demo.nfis.org:443/mapserver/nai.phtml to: 
https://ca.nfis.org/access/login.jsp?DACS_ERROR_CODE=902&DACS_VERSION=1.2&DACS_FEDERATION=nfis.org&DACS_JURISDICTION=DEMO&DACS_HOSTNAME=demo.nfis.org&DACS_USER_AGENT=Jakarta%20Commons-HttpClient%2f2.0.2&DACS_ERROR_URL=https://demo.nfis.org:443/mapserver/nai.phtml


> No support for https
> --------------------
>
>          Key: NUTCH-28
>          URL: http://issues.apache.org/jira/browse/NUTCH-28
>      Project: Nutch
>         Type: Improvement
>     Reporter: Stefan Grroschupf
>  Attachments: protocol-https.tgz
>
> transferred from:
> http://sourceforge.net/tracker/index.php?func=detail&aid=986240&group_id=59548&atid=491356
> submitted by:
> Konstantin Ignatyev
> Crawl tool does not support https protocol.
> I have created very simple one based on
> commons-httpclient and attached it to the report. It
> seems working although required commons-httpclient.jar
> and commons-logging.jar in lib directory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira

Reply via email to