Hi,

In JCR-390 we added support for text extraction in background threads.
This was done with the PooledTextExtractor class that maintains a pool
of threads for this purpose. Do we need that pool, or could we simply
just start a new thread for each new extraction task? That would
simplify the indexing code.

The time to start a new thread is probably minimal compared to that of
parsing a document. And when you're parsing a lot of large documents,
much of the time is spent waiting for IO so the more concurrent
threads you have the better throughput you get.

BR,

Jukka Zitting

Reply via email to