Improved background text extraction
-----------------------------------
Key: JCR-2219
URL: https://issues.apache.org/jira/browse/JCR-2219
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: indexing, jackrabbit-core
Reporter: Jukka Zitting
Priority: Minor
As recently discussed on the mailing list (see
http://markmail.org/message/syt7lc2guzapt7la), the current approach to text
extraction in background threads doesn't work that well especially with the
Tika-based extractors that support streamed parsing of many document types.
Also, we currently *all* of the extracted text streams are buffered into
Strings before being passed into the Lucene index. It would be good if we could
somehow get back to passing just Readers to Lucene.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.