[
https://issues.apache.org/jira/browse/NUTCH-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel updated NUTCH-1825:
-----------------------------------
Attachment: NUTCH-1825-trunk-v3.patch
NUTCH-1825-trunk-v2.patch
Hi [~pkieu], thanks! With your proxy the problem is easy to reproduce. Indeed,
the read blocks when content-length is reached. I tried your patch: it may
happen that the fetched and stored content overflows the content limit.
Attached two patches:
* v2: current behaviour but does no read anymore if content length is reached.
The fetched and stored content length may be smaller than the content limit
(less than buffer size, 8kB)
* v3: additionally, if content limit may be reached with next chunk: read less
bytes and try to use the content limit entirely
> protocol-http may hang for certain web pages
> --------------------------------------------
>
> Key: NUTCH-1825
> URL: https://issues.apache.org/jira/browse/NUTCH-1825
> Project: Nutch
> Issue Type: Bug
> Components: protocol
> Affects Versions: 1.9
> Reporter: Phu Kieu
> Priority: Minor
> Attachments: HttpResponse.java.patch, NUTCH-1825-trunk-v2.patch,
> NUTCH-1825-trunk-v3.patch, proxy.js
>
>
> There is a rare case where protocol-http will wait for data even when all the
> data has been sent.
> Patch is attached; please test and confirm.
--
This message was sent by Atlassian JIRA
(v6.2#6252)