[
https://issues.apache.org/jira/browse/CXF-5549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031641#comment-14031641
]
Andriy Redko commented on CXF-5549:
-----------------------------------
As agreed, the first step is to create an extractor and test to do:
- extract metadata and raw content from provided PDF docoument using Apache
Tika
- use Apache Lucene (RAM index / StandardAnalyzer) to index the metadata and
context
- use LuceneQueryVisitor on top of filter expression and the index to match
the documents
Thanks.
Andriy
> Introduce Tika Search Visitor
> -----------------------------
>
> Key: CXF-5549
> URL: https://issues.apache.org/jira/browse/CXF-5549
> Project: CXF
> Issue Type: New Feature
> Components: JAX-RS
> Reporter: Sergey Beryozkin
> Assignee: Andriy Redko
> Priority: Minor
>
> Introduce TikaSearchVisitor which will convert FIQL/etc search expression
> into Apache Tika component that can be used to search the binary data; for
> example, the service can support something like "find all PDF files matching
> a given expression"
--
This message was sent by Atlassian JIRA
(v6.2#6252)