[
https://issues.apache.org/jira/browse/CONNECTORS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159743#comment-13159743
]
Karl Wright commented on CONNECTORS-298:
----------------------------------------
When I attempt to crawl the above site, I get the following history:
{code}
11-29-2011 20:57:30.199 job end 1322618211892(test)
0 1
11-29-2011 20:57:27.566 fetch https://learningresources.nga.gov/
-11 0 1 robots.txt says so
11-29-2011 20:57:27.551 robots parse
https:learningresources.nga.gov:443
SUCCESS 0 1
11-29-2011 20:57:27.147 fetch
https://learningresources.nga.gov/robots.txt
200 82 391
11-29-2011 20:57:22.147 fetch http://learningresources.nga.gov
301 242 137
11-29-2011 20:57:17.177 fetch
http://learningresources.nga.gov/robots.txt
-103 0 729 java.lang.RuntimeException: Unexpected error:
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter
must be non-empty
11-29-2011 20:57:10.150 job start 1322618211892(test)
0 1
{code}
Note that the initial fetch of robots.txt via http was unsuccessful and threw
the exception, but the subsequent fetch of robots.txt with protocol specified
as https worked fine with no errors. It appears to me that the problem may be
that the site itself is using SSL even when the incoming request is not https.
Note also that the crawl stopped because robots prohibited it.
> Web connector: SSL does not use custom SSL socket factory in all cases
> ----------------------------------------------------------------------
>
> Key: CONNECTORS-298
> URL: https://issues.apache.org/jira/browse/CONNECTORS-298
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Affects Versions: ManifoldCF 0.3
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 0.4
>
>
> When crawling learningresources.nga.gov, the web connector gets a strange
> exception from certificate verification logic in Sun's SSL implementation.
> The stack trace indicates that the ManifoldCF secure socket factory may not
> have been used to set up the stream either. Here's the trace:
> INFO 2011-11-29 20:13:33,397 (Thread-535) - I/O exception
> (javax.net.ssl.SSLException) caught when processing request:
> java.lang.RuntimeException: Unexpected error:
> java.security.InvalidAlgorithmParameterException: the trustAnchors parameter
> must be non-empty
> DEBUG 2011-11-29 20:13:33,397 (Thread-535) - java.lang.RuntimeException:
> Unexpected error: java.security.InvalidAlgorithmParameterException: the
> trustAnchors parameter must be non-empty
> javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error:
> java.security.InvalidAlgorithmParameterException: the trustAnchors parameter
> must be non-empty
> at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1612)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1595)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1521)
> at
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:64)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
> at
> org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(Unknown
> Source)
> at
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.flushRequestOutputStream(Unknown
> Source)
> at org.apache.commons.httpclient.HttpMethodBase.writeRequest(Unknown
> Source)
> at org.apache.commons.httpclient.HttpMethodBase.execute(Unknown Source)
> at
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown
> Source)
> at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(Unknown Source)
> at org.apache.commons.httpclient.HttpClient.executeMethod(Unknown
> Source)
> at
> org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledConnection$ExecuteMethodThread.run(ThrottledFetcher.java:1244)
> Caused by: java.lang.RuntimeException: Unexpected error:
> java.security.InvalidAlgorithmParameterException: the trustAnchors parameter
> must be non-empty
> at sun.security.validator.PKIXValidator.<init>(PKIXValidator.java:57)
> at sun.security.validator.Validator.getInstance(Validator.java:161)
> at
> com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:108)
> at
> com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:204)
> at
> com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:249)
> at
> com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1185)
> at
> com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
> at
> com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
> at
> com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:893)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138)
> at
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:632)
> at
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
> ... 10 more
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira