[ https://issues.apache.org/jira/browse/NUTCH-2929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472772#comment-17472772 ]
Markus Jelsma commented on NUTCH-2929: -------------------------------------- I haven't seen this problem in our crawler before. The 10ms default delay is reasonable, those that are unaffected by the problem should indeed not be impacted by this. +1 > Fetcher: start threads slowly to avoid that resources are temporarily > exhausted > ------------------------------------------------------------------------------- > > Key: NUTCH-2929 > URL: https://issues.apache.org/jira/browse/NUTCH-2929 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.18 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.19 > > > Fetcher spins all threads without any delay. This may cause that certain > resources are temporarily exhausted if all threads start fetching the first > pages simultaneously. > The issue has been observed by Tika warnings about overuse of the SAXParser > pool which appeared only during the first 2-5 minutes of fetching a segment. > See https://lists.apache.org/thread/lo6b9wdlxy2lz12wmosldgl9x9ov1cks - adding > a short delay between thread launches makes the warnings disappear. -- This message was sent by Atlassian Jira (v8.20.1#820001)