[
https://issues.apache.org/jira/browse/NUTCH-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14278832#comment-14278832
]
Sebastian Nagel commented on NUTCH-1919:
----------------------------------------
+1
> Getting timeout when server returns Content-Length: 0
> ------------------------------------------------------
>
> Key: NUTCH-1919
> URL: https://issues.apache.org/jira/browse/NUTCH-1919
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Reporter: Julien Nioche
> Fix For: 1.10
>
> Attachments: NUTCH-1919.patch
>
>
> This has been investigated in fixed in the Storm-Crawler
> [https://github.com/DigitalPebble/storm-crawler/issues/48].
> {quote}
> curl -I "http://www.dailynewslosangeles.com/"
> HTTP/1.1 301 Moved Permanently
> Location: http://www.dailynews.com
> Connection: close
> Content-Length: 0
> Content-Type: text/html; charset=UTF-8
> {quote}
> when fetching with Nutch we are getting a timeout exception :
> {quote}
> ./nutch parsechecker -D http.agent.name="PebbleCrawler"
> "http://www.dailynewslosangeles.com/"
> fetching: http://www.dailynewslosangeles.com/
> Fetch failed with protocol status: exception(16), lastModified=0:
> java.net.SocketTimeoutException: Read timed out
> {quote}
> The reason for this is that we are trying to read from the stream even though
> we know that the content length is 0.
> The patch attached fixes the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)