[ http://issues.apache.org/jira/browse/JCR-415?page=all ]
Jukka Zitting updated JCR-415:
------------------------------
Attachment: jackrabbit-extractor-r420472.patch
Attached a proposal patch containing a mostly complete implementation of the
TextExtractor idea I discussed briefly on the mailing list. This covers just a
part of this issue, but should simplify further work considerably.
The attached patch (jackrabbit-extractor-r420472.patch) contains just the
TextExtractor interface and related classes placed in
org.apache.jackrabbit.extractor. I chose to place them outside of o.a.j.core as
they have no Jackrabbit dependencies, and would probably make a good
contribution to Apache Lucene once battle-tested.
I'll continue with a separate patch that backwards-compatibly replaces the
current TextFilter usage in o.a.j.core.query[.lucene], and with some class
diagrams that give a quick overview before diving into the javadocs.
> Enhance indexing of binary content
> ----------------------------------
>
> Key: JCR-415
> URL: http://issues.apache.org/jira/browse/JCR-415
> Project: Jackrabbit
> Type: Improvement
> Components: indexing
> Versions: 1.0, 1.0.1, 0.9
> Reporter: Marcel Reutegger
> Priority: Minor
> Fix For: 1.1
> Attachments: jackrabbit-extractor-r420472.patch
>
> Indexing of binary content should be enhanced in order to allow either
> configuration what fields are indexed or provide better support for custom
> NodeIndexer implementations.
> The current design has a couple of flaws that should be addressed at the same
> time:
> - Reader instances are requested from the text filters even though the reader
> might never be used
> - only jcr:data properties of nt:resource nodes are fulltext indexed
> - It is up to the text filter implementation to decide the lucene field name
> for the text representation, responsibility should be moved to the
> NodeIndexer. A text filter should only provide a Reader instance.
> With those changes a custom NodeIndexer can then decide if a binary property
> has one or more representations in the index.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira