[ 
https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042098#comment-14042098
 ] 

Karl Wright edited comment on CONNECTORS-981 at 6/24/14 1:28 PM:
-----------------------------------------------------------------

Hi Alessandro,

I'm afraid I disagree; rather than the primary content being some metadata with 
an arbitrary name, it should remain as primary content.  After the Tika 
Extractor, the stream *has* been converted to text/plain charset utf-8.  But:
- It's a stream, not a string, because it may be quite large
- Even if it were metadata, it would be a Reader, not a string, and converting 
it to as string before indexing would be a bad idea.

Surely SolrInputDocument has provision for handing a character stream?  If not, 
it's not a good abstraction for us to be using.





was (Author: [email protected]):
Hi Alessandro,

I'm afraid I disagree; rather than the primary content being some metadata with 
an arbitrary name, it should remain as primary content.  After the Tika 
Extractor, the stream *has* been converted to text/plain charset utf-8.  But:
- It's a stream, not a string, because it may be quite large
- Even if it were metadata, it would be a Reader, not a string, and converting 
it to as string before indexing would be a bad idea
Surely SolrInputDocument has provision for handing a character stream?  If not, 
it's not a good abstraction for us to be using.




> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
>                 Key: CONNECTORS-981
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-981
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector 
> processor, should be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted 
> metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as 
> it's related with how the fields will be processed)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to