[ https://issues.apache.org/jira/browse/CONNECTORS-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992633#comment-13992633 ]
Shinichiro Abe commented on CONNECTORS-936: ------------------------------------------- You can not post to Solr without binary because ExtractingRequestHandler of Solr requires binary data to extract contents. You have to set rd.setBinary with length > 0. I'm using customized CmisRepositoryConnector, then I'm posting a temporary text which has only one space for folder cmisobject as workaround. Hope this helps. > RepositoryDocuments with binaryFieldData = null causes issues with solr > ----------------------------------------------------------------------- > > Key: CONNECTORS-936 > URL: https://issues.apache.org/jira/browse/CONNECTORS-936 > Project: ManifoldCF > Issue Type: Bug > Affects Versions: ManifoldCF 1.6 > Reporter: Cetra Free > Priority: Minor > > If a RepositoryDocument is ingested into an activity without an InputStream > set using the setBinary method, it causes errors with the solr output > connector: > {code} > java.lang.IllegalArgumentException: Input stream may not be null > at org.apache.http.util.Args.notNull(Args.java:48) > at > org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:70) > at > org.apache.http.entity.mime.content.InputStreamBody.<init>(InputStreamBody.java:58) > at > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:201) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) > at > org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951) > {code} > This can be replicated by trying to ingest documents from a CMIS repository > which contain no content. > The dirty workaround I've come up with is just to provide a Null Input Stream > In *CmisRepositoryConnector.java*: > Import NullInputStream from commons: > {code} > import org.apache.commons.io.input.NullInputStream; > {code} > And Change: > {code} > if(fileLength>0 && document.getContentStream()!=null){ > is = document.getContentStream().getStream(); > rd.setBinary(is, fileLength); > } > {code} > To: > {code} > if(fileLength>0 && document.getContentStream()!=null){ > is = document.getContentStream().getStream(); > rd.setBinary(is, fileLength); > } else { > rd.setBinary(new NullInputStream(0),0); > } > {code} > I'm not sure what the correct fix would be. Possibly change the > *RepositoryDocument* class or handle the situation correctly in the Solr > connector. > It doesn't seem to be an issue with other repository connectors, such as > FileConnector, as they always provide an InputStream. -- This message was sent by Atlassian JIRA (v6.2#6252)