[ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740992#comment-16740992 ]
Subasini Rath commented on CONNECTORS-1563: ------------------------------------------- Thanks Karl. Just need to get clear one more doubt. I need to pass from manifold one custom field and value which I want to see in Solr index. That is the reason why I used metadata transformer where I can pass the custom field in job - tab metadata adjuster. If I will use only tika extractor, is there any way to pass custom field which we will get indexed in Solr. On 11-Jan-2019 11:17 PM, "Karl Wright (JIRA)" <j...@apache.org> wrote: [ https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587 ] Karl Wright commented on CONNECTORS-1563: ----------------------------------------- The metadata extractor can go anywhere in your pipeline, after Tika extraction. There is absolutely no point in having *two* Tika extractions though -- and that's what you're trying to do with the setup you've got. What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and inject content into Solr using the /update handler, not the /update/extract handler. There's also a checkbox you'd need to uncheck in the Solr connection configuration. It's all covered in the ManifoldCF end user documentation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) > SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream > must have > 0 bytes > ----------------------------------------------------------------------------------------------- > > Key: CONNECTORS-1563 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1563 > Project: ManifoldCF > Issue Type: Task > Components: Lucene/SOLR connector > Reporter: Sneha > Assignee: Karl Wright > Priority: Major > Attachments: managed-schema, solrconfig.xml > > > I am encountering this problem: > I have checked "Use the Extract Update Handler:" param then I am getting an > error on Solr i.e. null:org.apache.solr.common.SolrException: > org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 > bytes > If I ignore tika exception, my documents get indexed but dont have content > field on Solr. > I am using Solr 7.3.1 and manifoldCF 2.8.1 > I am using solr cell and hence not configured external tika extractor in > manifoldCF pipeline > Please help me with this problem > Thanks in advance -- This message was sent by Atlassian JIRA (v7.6.3#76005)