Hi, In JCR-390 we added support for text extraction in background threads. This was done with the PooledTextExtractor class that maintains a pool of threads for this purpose. Do we need that pool, or could we simply just start a new thread for each new extraction task? That would simplify the indexing code.
The time to start a new thread is probably minimal compared to that of parsing a document. And when you're parsing a lot of large documents, much of the time is spent waiting for IO so the more concurrent threads you have the better throughput you get. BR, Jukka Zitting
