[
https://issues.apache.org/jira/browse/CONNECTORS-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738979#comment-13738979
]
Karl Wright commented on CONNECTORS-764:
----------------------------------------
Ideally, this method should be called only when max hop count is increased.
Alternatively, when we are thinking about queuing the document and we're in the
"hop count exceeded" state, we can try to check the hopcount table's hop values
to see if the document state is indeed incorrect (although that would be
slower). All of these approaches have their problems, however.
A hybrid approach would be to detect inconsistency (namely, a record that had
the "hop count exceeded" state when it should not), and then take remedial
action by calling reactivateHopcountRemovedRecords() for the entire job before
proceeding. That also requires fetches of hopcounts for all queued documents
that are in the "hopcount exceeded" state, which could be problematic.
> Hopcount logic fails to notice when the max number of hops is increased
> between crawls
> --------------------------------------------------------------------------------------
>
> Key: CONNECTORS-764
> URL: https://issues.apache.org/jira/browse/CONNECTORS-764
> Project: ManifoldCF
> Issue Type: Bug
> Components: Framework crawler agent
> Affects Versions: ManifoldCF 1.3
> Reporter: Karl Wright
> Assignee: Karl Wright
> Fix For: ManifoldCF 1.4
>
>
> When you do something like the following:
> (1) Set the max hops for a job relatively low
> (2) Crawl
> (3) Increase the max hops
> (4) Crawl again
> ... the documents that are labeled with the state "Hop count exceeded" at the
> end of the first crawl are never touched again. This is because there are no
> additional links added to the intrinsiclink table during the second crawl,
> and thus the method reactivateHopcountRemovedRecords() is never called,
> leaving the documents in an incorrect state.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira