[ 
https://issues.apache.org/jira/browse/NUTCH-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183490#comment-14183490
 ] 

Sebastian Nagel commented on NUTCH-1825:
----------------------------------------

Comments and reviews welcome! The problem is easily reproducible:
* first terminal (with attached proxy.js and minimalistic document, delivered 
by local Apache):
{noformat}
% cat /var/www/test.html 
<html><head><title>test</title></head><body>test</body></html>
% nodejs -v
v0.10.25
% nodejs ./proxy.js
Listening on port 8080
{noformat}
* second terminal:
{noformat}
% bin/nutch plugin protocol-http org.apache.nutch.protocol.http.Http  
http://localhost:8080/test.html
Status: exception(16), lastModified=0: java.net.SocketTimeoutException: Read 
timed outbin/nutch parsechecker http://localhost:8080/test.html
% less .../hadoop.log
2014-10-24 22:37:13,214 ERROR http.Http - Failed to get protocol output
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        ...
        at java.io.FilterInputStream.read(FilterInputStream.java:107)
        at 
org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:293)
        at 
org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:221)
{noformat}
* also 2.x is affected!


> protocol-http may hang for certain web pages
> --------------------------------------------
>
>                 Key: NUTCH-1825
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1825
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>    Affects Versions: 1.9
>            Reporter: Phu Kieu
>            Priority: Minor
>         Attachments: HttpResponse.java.patch, NUTCH-1825-trunk-v2.patch, 
> NUTCH-1825-trunk-v3.patch, proxy.js
>
>
> There is a rare case where protocol-http will wait for data even when all the 
> data has been sent.
> Patch is attached; please test and confirm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to