[
https://issues.apache.org/jira/browse/JCR-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739690#action_12739690
]
Marcel Reutegger commented on JCR-2219:
---------------------------------------
Re-applied some of the 801135 changes to make test execution more reliable.
svn revision: 801375
> Improved background text extraction
> -----------------------------------
>
> Key: JCR-2219
> URL: https://issues.apache.org/jira/browse/JCR-2219
> Project: Jackrabbit Content Repository
> Issue Type: Improvement
> Components: indexing, jackrabbit-core
> Reporter: Jukka Zitting
> Priority: Minor
> Fix For: 2.0.0
>
> Attachments: JCR-2219.patch
>
>
> As recently discussed on the mailing list (see
> http://markmail.org/message/syt7lc2guzapt7la), the current approach to text
> extraction in background threads doesn't work that well especially with the
> Tika-based extractors that support streamed parsing of many document types.
> Also, we currently *all* of the extracted text streams are buffered into
> Strings before being passed into the Lucene index. It would be good if we
> could somehow get back to passing just Readers to Lucene.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.