[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

Subasini Rath (JIRA) Fri, 11 Jan 2019 20:43:14 -0800


    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740992#comment-16740992
 ]

Subasini Rath commented on CONNECTORS-1563:
-------------------------------------------

Thanks Karl.  Just need to get clear one more doubt.  I need to pass from 
manifold one custom field and value which I want to see in Solr index.  That is 
the reason why I used metadata transformer where I can pass the custom field in 
job - tab metadata adjuster.
If I will use only tika extractor,  is there any way to pass custom field which 
we will get indexed in Solr.

On 11-Jan-2019 11:17 PM, "Karl Wright (JIRA)" <j...@apache.org> wrote:

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740587#comment-16740587
 ]

Karl Wright commented on CONNECTORS-1563:
-----------------------------------------

The metadata extractor can go anywhere in your pipeline, after Tika extraction. 
 There is absolutely no point in having *two* Tika extractions though -- and 
that's what you're trying to do with the setup you've got.

What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and 
inject content into Solr using the /update handler, not the /update/extract 
handler.  There's also a checkbox you'd need to uncheck in the Solr connection 
configuration. It's all covered in the ManifoldCF end user documentation.

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> -----------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1563
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
>             Project: ManifoldCF
>          Issue Type: Task
>          Components: Lucene/SOLR connector
>            Reporter: Sneha
>            Assignee: Karl Wright
>            Priority: Major
>         Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance

--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

Reply via email to