[
https://issues.apache.org/jira/browse/CONNECTORS-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042728#comment-14042728
]
Karl Wright edited comment on CONNECTORS-981 at 6/24/14 11:41 PM:
------------------------------------------------------------------
So, let us talk about solutions. I think there are two possibilities:
(1) Use SolrInputDocument and also modify Solr Connector to have a
user-settable length limit.
OR
(2) Continue to use ContentStreamUpdateRequest in HttpPoster, but modify the
code to expect the RepositoryDocument to contain a utf-8-encoded input stream.
It's worth noting that the SolrHttpServer.add(SolrInputDocument) method does
the following:
{code}
public UpdateResponse [More ...] add(SolrInputDocument doc, int
commitWithinMs) throws SolrServerException, IOException {
UpdateRequest req = new UpdateRequest();
req.add(doc);
req.setCommitWithin(commitWithinMs);
return req.process(this);
}
{code}
Since both ContentStreamUpdateRequest and UpdateRequest are extensions of
AbstractUpdateRequest, and AbstractUpdateRequest is where content stream
support lives, there may be a way to do this by adding a content stream to an
UpdateRequest object directly. I'll have to look deeper at the UpdateRequest
code to see if that has any chance of working.
was (Author: [email protected]):
So, let us talk about solutions. I think there are two possibilities:
(1) Use SolrInputDocument and also modify Solr Connector to have a
user-settable length limit.
OR
(2) Continue to use ContentStreamUpdateRequest in HttpPoster, but modify the
code to expect the RepositoryDocument to contain a utf-8-encoded input stream.
It's worth noting that the SolrHttpServer.add(SolrInputDocument) method does
the following:
{code}
public UpdateResponse [More ...] add(SolrInputDocument doc, int
commitWithinMs) throws SolrServerException, IOException {
UpdateRequest req = new UpdateRequest();
req.add(doc);
req.setCommitWithin(commitWithinMs);
return req.process(this);
}
{code}
Since both ContentStreamUpdateRequest and UpdateRequest are extensions of
AbstractUpdateRequest, it is perfectly reasonable to continue to use
ContentStreamUpdateRequest instead of trying to force everything into
SolrInputDocument. And that way, the problem is effectively solved. The only
thing you'd want to do is research the differences between UpdateRequest and
ContentStreamUpdateRequest to be sure that we pick the same target URL.
> Solr Connector - classic Solrj SolrInputDocument support
> --------------------------------------------------------
>
> Key: CONNECTORS-981
> URL: https://issues.apache.org/jira/browse/CONNECTORS-981
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.7
> Reporter: Alessandro Benedetti
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.7
>
> Attachments: CONNECTORS-981.patch
>
>
> The solr connector, according with the development of the Tika Connector
> processor, should be able to operate in 2 ways :
> 1) as usual
> 2) using the classic Solrj SolrInputDocument approach with already extracted
> metadata
> To allow the choice a flag will be added in the UI in the mapping tab ( as
> it's related with how the fields will be processed)
--
This message was sent by Atlassian JIRA
(v6.2#6252)