[
https://issues.apache.org/jira/browse/CONNECTORS-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963987#comment-13963987
]
Karl Wright commented on CONNECTORS-920:
----------------------------------------
There are a couple of problems with this.
First, the user is pointing at a part of the code that merely logs the activity
and rethrows the exception. The activity status is most certainly "FAILED",
not "SOLR REJECT", here, because the cause of SocketTimeoutException is not
typically a rejected document.
But the user's complaint, that the crawl is stopped, seems to be also be hard
to fathom. The catch that actually does the work calls method
handleSolrServerException() is called, which should call in turn
handleIOException() for a nested IO exception. That, in turn, looks for
various special conditions, but in the case of the unmentioned
SocketTimeoutException, throws a ServiceInterruption.
I propose modifying the code to explicitly look for SocketTimeoutException in
handleIOException(), and throw a specific ServiceInterruption for that case.
But unless the document consistently fails to index for many retries, I can't
see how the described behavior can take place.
> Solr Connector doesn't handle embedded SocketTimeoutException properly
> ----------------------------------------------------------------------
>
> Key: CONNECTORS-920
> URL: https://issues.apache.org/jira/browse/CONNECTORS-920
> Project: ManifoldCF
> Issue Type: Bug
> Components: Lucene/SOLR connector
> Affects Versions: ManifoldCF 1.5.1
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.6
>
>
> As reported in the user list:
> "I'm using MCF1.5.1 and Solr4.6.1.
> When I use SolrConnecotor, sometimes SolrServerException occurs.
> Normally, SolrServerException is caught by HttpPoster, line 950.
> But in my case, the inner exception of SolrServerException is
> SocketTimeoutException, not SocketException.
> So, activityCode is set to failed, then mcf interrupt the crawl process.
> In this case, I expect that mcf shouldn't interrupt the crawl process.
> Mcf should skip the invalid file.
> Could you modify the mcf code, or provide a option?
> If you could, I'm glad to being modified in the future version."
--
This message was sent by Atlassian JIRA
(v6.2#6252)