[
https://issues.apache.org/jira/browse/HTTPCORE-162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602926#action_12602926
]
Sam Berlin commented on HTTPCORE-162:
-------------------------------------
I think this stems from a misunderstanding (and perhaps incomplete
documentation) of how ThrottlingHttpClientHandler works. The throttling, as I
understand it, is to keep the in-memory buffered contents of the pages down --
but I think the throttling is per-connection, not over all connections. It
looks like you're doing a semi-crawl, scanning each page for more links and
spawning more connects. Since you're using an unbounded threadpool, this means
each connect is going to spawn even more threads, and each of those is going to
spawn even more threads... each of which is going to create its own throttled
buffer with a limited size. Eventually, there's going to be so many threads
running and so many buffers created that it's going to trigger an OOM.
There's a few things to workaround this.
One way is to use AsyncNHttpClientHandler (only available in httpcore-nio
snapshots right now), but that requires a pretty extensive change to the way
you're parsing links -- you'd have to parse the results in piecemeal instead of
a whole page at a time. (The async handler notifies you when any bit of data
is available, but you aren't guaranteed that all of it is.)
Another approach is to use a fixed-size thread pool. This is the easiest, but
is going to significantly reduce speed if there's some lagging slower
connections.
Another approach would be to hack into ThrottlingHttpClientHandler and make the
total buffer size a shared resource among all connections. That'd be a
significant change, and would have implications beyond slower connections -- it
might lead towards some worker threads starving others from being able to read.
Throttling over multiple connections in a fair non-blocking way is very
difficult.
> Out of Memory when using ThrottlingHttpClientHandler
> -----------------------------------------------------
>
> Key: HTTPCORE-162
> URL: https://issues.apache.org/jira/browse/HTTPCORE-162
> Project: HttpComponents HttpCore
> Issue Type: Bug
> Components: HttpCore NIO
> Affects Versions: 4.0-beta1
> Reporter: maomaode
> Attachments: 162-testcase.patch
>
>
> I'm hitting a Out Of Memory error when using ThrottlingHttpClientHandler
> <http://hc.apache.org/httpcomponents-core/httpcore-nio/apidocs/org/apache/http/nio/protocol/ThrottlingHttpClientHandler.html>
>
> with the Executors.newCachedThreadPool() , Will provide a testcase later
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]