Phu Kieu created NUTCH-1824:
-------------------------------

             Summary: protocol-http using proxy not working with https sites
                 Key: NUTCH-1824
                 URL: https://issues.apache.org/jira/browse/NUTCH-1824
             Project: Nutch
          Issue Type: Bug
          Components: protocol
    Affects Versions: 1.9
            Reporter: Phu Kieu
            Priority: Minor


https sites do not work with protocol-http using a proxy.

Further inspection of the source shows that it is not issuing CONNECT requests 
when it encounters an https address.

2014-08-15 09:27:20,295 INFO  api.HttpRobotRulesParser - Couldn't get 
robots.txt for https://*****: java.net.ConnectException: Connection refused
2014-08-15 09:27:20,296 ERROR http.Http - Failed to get protocol output
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:382)
        at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:241)
        at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:228)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)
        at java.net.Socket.connect(Socket.java:527)
        at 
org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:126)
        at org.apache.nutch.protocol.http.Http.getResponse(Http.java:72)
        at 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:183)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:715)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to