[ 
https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042156#comment-14042156
 ] 

Alessandro Benedetti commented on CONNECTORS-981:
-------------------------------------------------

Following you, but let's analyze what Tika does in the Extract Update Handler :
It's extracts the stream and put it in a Solr field ("content") which is a 
string.

So, using the Solr Connector in "No Extract" mode you are saying you have 
already the content extracted , so I don't get the problem in having it in a 
string.
I guess that will be normal sing the tika Extractor to have the String Copy of 
the binary stream in thr Repo Document, to be then processed by the 
OutputConnector.
This is how Solr works and I suppose that should be the correct behaviour when 
you select to not using the extract request handler.
But if you think it's better I can add a line in the Solr Connector 
transforming the Binary Stream to String and then created the field in the 
SolrinputDocument.



> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
>                 Key: CONNECTORS-981
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-981
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Lucene/SOLR connector
>    Affects Versions: ManifoldCF 1.7
>            Reporter: Alessandro Benedetti
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.7
>
>         Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector 
> processor, should be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted 
> metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as 
> it's related with how the fields will be processed)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to