[jira] Commented: (JCR-2219) Improved background text extraction

Marcel Reutegger (JIRA) Wed, 05 Aug 2009 12:30:45 -0700

    [ 
https://issues.apache.org/jira/browse/JCR-2219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739690#action_12739690
 ]


Marcel Reutegger commented on JCR-2219:
---------------------------------------

Re-applied some of the 801135 changes to make test execution more reliable.

svn revision: 801375

> Improved background text extraction
> -----------------------------------
>
>                 Key: JCR-2219
>                 URL: https://issues.apache.org/jira/browse/JCR-2219
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: indexing, jackrabbit-core
>            Reporter: Jukka Zitting
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: JCR-2219.patch
>
>
> As recently discussed on the mailing list (see 
> http://markmail.org/message/syt7lc2guzapt7la), the current approach to text 
> extraction in background threads doesn't work that well especially with the 
> Tika-based extractors that support streamed parsing of many document types.
> Also, we currently *all* of the extracted text streams are buffered into 
> Strings before being passed into the Lucene index. It would be good if we 
> could somehow get back to passing just Readers to Lucene.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (JCR-2219) Improved background text extraction

Reply via email to