[
https://issues.apache.org/jira/browse/NUTCH-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280143#comment-14280143
]
Hudson commented on NUTCH-1919:
-------------------------------
SUCCESS: Integrated in Nutch-trunk #2936 (See
[https://builds.apache.org/job/Nutch-trunk/2936/])
(NUTCH-1919) Getting timeout when server returns Content-Length: 0 (jnioche:
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1652391)
* /nutch/trunk/CHANGES.txt
*
/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
> Getting timeout when server returns Content-Length: 0
> ------------------------------------------------------
>
> Key: NUTCH-1919
> URL: https://issues.apache.org/jira/browse/NUTCH-1919
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Reporter: Julien Nioche
> Fix For: 1.10
>
> Attachments: NUTCH-1919.patch
>
>
> This has been investigated in fixed in the Storm-Crawler
> [https://github.com/DigitalPebble/storm-crawler/issues/48].
> {quote}
> curl -I "http://www.dailynewslosangeles.com/"
> HTTP/1.1 301 Moved Permanently
> Location: http://www.dailynews.com
> Connection: close
> Content-Length: 0
> Content-Type: text/html; charset=UTF-8
> {quote}
> when fetching with Nutch we are getting a timeout exception :
> {quote}
> ./nutch parsechecker -D http.agent.name="PebbleCrawler"
> "http://www.dailynewslosangeles.com/"
> fetching: http://www.dailynewslosangeles.com/
> Fetch failed with protocol status: exception(16), lastModified=0:
> java.net.SocketTimeoutException: Read timed out
> {quote}
> The reason for this is that we are trying to read from the stream even though
> we know that the content length is 0.
> The patch attached fixes the issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)