[ https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042156#comment-14042156 ]
Alessandro Benedetti commented on CONNECTORS-981: ------------------------------------------------- Following you, but let's analyze what Tika does in the Extract Update Handler : It's extracts the stream and put it in a Solr field ("content") which is a string. So, using the Solr Connector in "No Extract" mode you are saying you have already the content extracted , so I don't get the problem in having it in a string. I guess that will be normal sing the tika Extractor to have the String Copy of the binary stream in thr Repo Document, to be then processed by the OutputConnector. This is how Solr works and I suppose that should be the correct behaviour when you select to not using the extract request handler. But if you think it's better I can add a line in the Solr Connector transforming the Binary Stream to String and then created the field in the SolrinputDocument. > Solr Connector - classic Solrj SolrInputDocument support > -------------------------------------------------------- > > Key: CONNECTORS-981 > URL: https://issues.apache.org/jira/browse/CONNECTORS-981 > Project: ManifoldCF > Issue Type: Improvement > Components: Lucene/SOLR connector > Affects Versions: ManifoldCF 1.7 > Reporter: Alessandro Benedetti > Assignee: Karl Wright > Fix For: ManifoldCF 1.7 > > Attachments: CONNECTORS-981.patch > > > The solr connector, according with the development of the Tika Connector > processor, should be able to operate in 2 ways : > 1) as usual > 2) using the classic Solrj SolrInputDocument approach with already extracted > metadata > To allow the choice a flag will be added in the UI in the mapping tab ( as > it's related with how the fields will be processed) -- This message was sent by Atlassian JIRA (v6.2#6252)