[ http://issues.apache.org/jira/browse/JCR-415?page=comments#action_12420293 ]
Marcel Reutegger commented on JCR-415: -------------------------------------- Jukka wrote: > I think it would make more design sense to try to postpone the creation of > the Document instances > instead of delaying text extraction. But I'm not too familiar with the > details, so I'm OK with adding lazy > reading to the mix. In any case I think it's best to layer the lazy reading > on top of the TextExtractor interface > instead of below it. A utility class like the following could achieve this as > long as the given InputStream > remains valid until the document has been read. Yes, you are right. I thought I could get away with the dirty solution ;) While going through your patch I was actually also thinking about a design that should create the document only when it is really added to the index. For now we can maybe use the TextExtractorReader you proposed and then in a next step change the design to create the Document in a later stage of the indexing process. > Enhance indexing of binary content > ---------------------------------- > > Key: JCR-415 > URL: http://issues.apache.org/jira/browse/JCR-415 > Project: Jackrabbit > Type: Improvement > Components: indexing > Versions: 1.0, 1.0.1, 0.9 > Reporter: Marcel Reutegger > Priority: Minor > Fix For: 1.1 > Attachments: jackrabbit-extractor-r420472.patch, > jackrabbit-query-r420472.patch, > org.apache.jackrabbit.core.query-extractor.jpg, > org.apache.jackrabbit.core.query.lucene-extractor.jpg, > org.apache.jackrabbit.extractor.jpg > > Indexing of binary content should be enhanced in order to allow either > configuration what fields are indexed or provide better support for custom > NodeIndexer implementations. > The current design has a couple of flaws that should be addressed at the same > time: > - Reader instances are requested from the text filters even though the reader > might never be used > - only jcr:data properties of nt:resource nodes are fulltext indexed > - It is up to the text filter implementation to decide the lucene field name > for the text representation, responsibility should be moved to the > NodeIndexer. A text filter should only provide a Reader instance. > With those changes a custom NodeIndexer can then decide if a binary property > has one or more representations in the index. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira