[
https://issues.apache.org/jira/browse/NUTCH-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17473091#comment-17473091
]
Sebastian Nagel commented on NUTCH-2929:
----------------------------------------
It wasn't that many Tika warnings:
- (non-parsing Fetcher, 160 threads) 400-500 during the first minute, then up
to 50 for the next 2-4 minutes, nothing then
- (parsing Fetcher, 80 threads) ~2000 during the first minute, disappearing
entirely within few minutes.
Note: Tika is used for MIME type detection whether Fetcher is parsing or not-
> Fetcher: start threads slowly to avoid that resources are temporarily
> exhausted
> -------------------------------------------------------------------------------
>
> Key: NUTCH-2929
> URL: https://issues.apache.org/jira/browse/NUTCH-2929
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 1.18
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.19
>
>
> Fetcher spins all threads without any delay. This may cause that certain
> resources are temporarily exhausted if all threads start fetching the first
> pages simultaneously.
> The issue has been observed by Tika warnings about overuse of the SAXParser
> pool which appeared only during the first 2-5 minutes of fetching a segment.
> See https://lists.apache.org/thread/lo6b9wdlxy2lz12wmosldgl9x9ov1cks - adding
> a short delay between thread launches makes the warnings disappear.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)