[ 
https://issues.apache.org/jira/browse/CONNECTORS-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159743#comment-13159743
 ] 

Karl Wright commented on CONNECTORS-298:
----------------------------------------

When I attempt to crawl the above site, I get the following history:

{code}
11-29-2011 20:57:30.199         job end         1322618211892(test)
                0       1       
11-29-2011 20:57:27.566         fetch   https://learningresources.nga.gov/
        -11     0       1       robots.txt says so
11-29-2011 20:57:27.551         robots parse    
https:learningresources.nga.gov:443
        SUCCESS         0       1       
11-29-2011 20:57:27.147         fetch   
https://learningresources.nga.gov/robots.txt
        200     82      391     
11-29-2011 20:57:22.147         fetch   http://learningresources.nga.gov
        301     242     137     
11-29-2011 20:57:17.177         fetch   
http://learningresources.nga.gov/robots.txt
        -103    0       729     java.lang.RuntimeException: Unexpected error: 
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
must be non-empty
11-29-2011 20:57:10.150         job start       1322618211892(test)
                0       1       
{code}

Note that the initial fetch of robots.txt via http was unsuccessful and threw 
the exception, but the subsequent fetch of robots.txt with protocol specified 
as https worked fine with no errors.  It appears to me that the problem may be 
that the site itself is using SSL even when the incoming request is not https.

Note also that the crawl stopped because robots prohibited it.

                
> Web connector: SSL does not use custom SSL socket factory in all cases
> ----------------------------------------------------------------------
>
>                 Key: CONNECTORS-298
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-298
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.3
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 0.4
>
>
> When crawling learningresources.nga.gov, the web connector gets a strange 
> exception from certificate verification logic in Sun's SSL implementation.  
> The stack trace indicates that the ManifoldCF secure socket factory may not 
> have been used to set up the stream either.  Here's the trace:
>  INFO 2011-11-29 20:13:33,397 (Thread-535) - I/O exception 
> (javax.net.ssl.SSLException) caught when processing request: 
> java.lang.RuntimeException: Unexpected error: 
> java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
> must be non-empty
> DEBUG 2011-11-29 20:13:33,397 (Thread-535) - java.lang.RuntimeException: 
> Unexpected error: java.security.InvalidAlgorithmParameterException: the 
> trustAnchors parameter must be non-empty
> javax.net.ssl.SSLException: java.lang.RuntimeException: Unexpected error: 
> java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
> must be non-empty
>       at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:190)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1649)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1612)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1595)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1521)
>       at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:64)
>       at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>       at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
>       at 
> org.apache.commons.httpclient.HttpConnection.flushRequestOutputStream(Unknown 
> Source)
>       at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.flushRequestOutputStream(Unknown
>  Source)
>       at org.apache.commons.httpclient.HttpMethodBase.writeRequest(Unknown 
> Source)
>       at org.apache.commons.httpclient.HttpMethodBase.execute(Unknown Source)
>       at 
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(Unknown 
> Source)
>       at 
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(Unknown Source)
>       at org.apache.commons.httpclient.HttpClient.executeMethod(Unknown 
> Source)
>       at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledConnection$ExecuteMethodThread.run(ThrottledFetcher.java:1244)
> Caused by: java.lang.RuntimeException: Unexpected error: 
> java.security.InvalidAlgorithmParameterException: the trustAnchors parameter 
> must be non-empty
>       at sun.security.validator.PKIXValidator.<init>(PKIXValidator.java:57)
>       at sun.security.validator.Validator.getInstance(Validator.java:161)
>       at 
> com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.getValidator(X509TrustManagerImpl.java:108)
>       at 
> com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:204)
>       at 
> com.sun.net.ssl.internal.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:249)
>       at 
> com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1185)
>       at 
> com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
>       at 
> com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
>       at 
> com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:893)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1138)
>       at 
> com.sun.net.ssl.internal.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:632)
>       at 
> com.sun.net.ssl.internal.ssl.AppOutputStream.write(AppOutputStream.java:59)
>       ... 10 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to