[jira] [Commented] (CONNECTORS-936) RepositoryDocuments with binaryFieldData = null causes issues with solr

Shinichiro Abe (JIRA) Thu, 15 May 2014 10:15:01 -0700

    [ 
https://issues.apache.org/jira/browse/CONNECTORS-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992633#comment-13992633
 ]


Shinichiro Abe commented on CONNECTORS-936:
-------------------------------------------

You can not post to Solr without binary because ExtractingRequestHandler of 
Solr requires  binary data to extract contents. You have to set rd.setBinary 
with length > 0. I'm using customized CmisRepositoryConnector, then I'm posting 
a temporary text which has only one space for folder cmisobject as workaround. 
Hope this helps.

> RepositoryDocuments with binaryFieldData = null causes issues with solr
> -----------------------------------------------------------------------
>
>                 Key: CONNECTORS-936
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-936
>             Project: ManifoldCF
>          Issue Type: Bug
>    Affects Versions: ManifoldCF 1.6
>            Reporter: Cetra Free
>            Priority: Minor
>
> If a RepositoryDocument is ingested into an activity without an InputStream 
> set using the setBinary method, it causes errors with the solr output 
> connector:
> {code}
> java.lang.IllegalArgumentException: Input stream may not be null
>       at org.apache.http.util.Args.notNull(Args.java:48)
>       at 
> org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:70)
>       at 
> org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:58)
>       at 
> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:201)
>       at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>       at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>       at 
> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
> {code}
> This can be replicated by trying to ingest documents from a CMIS repository 
> which contain no content.
> The dirty workaround I've come up with is just to provide a Null Input Stream
> In *CmisRepositoryConnector.java*:
> Import NullInputStream from commons:
> {code}
> import org.apache.commons.io.input.NullInputStream;
> {code}
> And Change:
> {code}
>           if(fileLength>0 && document.getContentStream()!=null){
>             is = document.getContentStream().getStream();
>             rd.setBinary(is, fileLength);
>           }
> {code}
> To:
> {code}
>           if(fileLength>0 && document.getContentStream()!=null){
>             is = document.getContentStream().getStream();
>             rd.setBinary(is, fileLength);
>           } else {
>             rd.setBinary(new NullInputStream(0),0);
>           }
> {code}
> I'm not sure what the correct fix would be.  Possibly change the 
> *RepositoryDocument* class or handle the situation correctly in the Solr 
> connector.
> It doesn't seem to be an issue with other repository connectors, such as 
> FileConnector, as they always provide an InputStream.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CONNECTORS-936) RepositoryDocuments with binaryFieldData = null causes issues with solr

Reply via email to