[
https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042098#comment-14042098
]
Karl Wright edited comment on CONNECTORS-981 at 6/24/14 1:28 PM:
-----------------------------------------------------------------
Hi Alessandro,
I'm afraid I disagree; rather than the primary content being some metadata with
an arbitrary name, it should remain as primary content. After the Tika
Extractor, the stream *has* been converted to text/plain charset utf-8. But:
- It's a stream, not a string, because it may be quite large
- Even if it were metadata, it would be a Reader, not a string, and converting
it to as string before indexing would be a bad idea.
Surely SolrInputDocument has provision for handing a character stream? If not,
it's not a good abstraction for us to be using.
was (Author: [email protected]):
Hi Alessandro,
I'm afraid I disagree; rather than the primary content being some metadata with
an arbitrary name, it should remain as primary content. After the Tika
Extractor, the stream *has* been converted to text/plain charset utf-8. But:
- It's a stream, not a string, because it may be quite large
- Even if it were metadata, it would be a Reader, not a string, and converting
it to as string before indexing would be a bad idea
Surely SolrInputDocument has provision for handing a character stream? If not,
it's not a good abstraction for us to be using.
> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
> Key: CONNECTORS-981
> URL: https://issues.apache.org/jira/browse/CONNECTORS-981
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Alessandro Benedetti
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector
> processor, should be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted
> metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as
> it's related with how the fields will be processed)
--
This message was sent by Atlassian JIRA
(v6.2#6252)