[ 
https://issues.apache.org/jira/browse/NUTCH-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerard Bouchar updated NUTCH-2561:
----------------------------------
    Description: 
protocol-http limits the size of the HTTP response body. However
 * There is no limit over the size of the HTTP headers it reads. A bogus server 
could send an infinite stream of different HTTP headers and cause the fetcher 
to go out of memory, or send the same HTTP header repeatedly and cause the 
fetcher to timeout.
 * The same goes for the HTTP status line: no check is made concerning its size.

This can be both a performance and a security problem.

Joined is an example python implementation of a server that makes protocol-http 
receive huge amounts of data, without being stopped by http.getTimeout() nor 
http.getMaxContent().

{color:#9876aa}http{color}.getMaxContent()

  was:
protocol-http limits the size of the HTTP response body. However
 * There is no limit over the size of the HTTP headers it reads. A bogus server 
could send an infinite stream of different HTTP headers and cause the fetcher 
to go out of memory, or send the same HTTP header repeatedly and cause the 
fetcher to timeout.
 * The same goes for the HTTP status line: no check is made concerning its size.

This can be both a performance and a security problem


> protocol-http can be made to read arbitrarily large HTTP responses
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2561
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2561
>             Project: Nutch
>          Issue Type: Sub-task
>            Reporter: Gerard Bouchar
>            Priority: Critical
>         Attachments: evilserver.py
>
>
> protocol-http limits the size of the HTTP response body. However
>  * There is no limit over the size of the HTTP headers it reads. A bogus 
> server could send an infinite stream of different HTTP headers and cause the 
> fetcher to go out of memory, or send the same HTTP header repeatedly and 
> cause the fetcher to timeout.
>  * The same goes for the HTTP status line: no check is made concerning its 
> size.
> This can be both a performance and a security problem.
> Joined is an example python implementation of a server that makes 
> protocol-http receive huge amounts of data, without being stopped by 
> http.getTimeout() nor http.getMaxContent().
> {color:#9876aa}http{color}.getMaxContent()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to