[ https://issues.apache.org/jira/browse/NUTCH-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13291222#comment-13291222 ]
Markus Jelsma commented on NUTCH-1342: -------------------------------------- Unless there are objections or improvements, i'll commit this one in the next few days. > Read time out protocol-http > --------------------------- > > Key: NUTCH-1342 > URL: https://issues.apache.org/jira/browse/NUTCH-1342 > Project: Nutch > Issue Type: Bug > Components: fetcher > Affects Versions: 1.4, 1.5 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Priority: Critical > Fix For: 1.6 > > Attachments: NUTCH-1342-1.6-1.patch > > > For some reason some URL's always time out with protocol-http but not > protocol-httpclient. The stack trace is always the same: > {code} > 2012-04-20 11:25:44,275 ERROR http.Http - Failed to get protocol output > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:129) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > at java.io.BufferedInputStream.read(BufferedInputStream.java:317) > at java.io.FilterInputStream.read(FilterInputStream.java:116) > at java.io.PushbackInputStream.read(PushbackInputStream.java:169) > at java.io.FilterInputStream.read(FilterInputStream.java:90) > at > org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:228) > at > org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:157) > at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138) > {code} > Some example URL's: > * 404 http://www.fcgroningen.nl/tribunenamen/stemmen/ > * 301 http://shop.fcgroningen.nl/aanbieding -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira