[
https://issues.apache.org/jira/browse/CONNECTORS-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karl Wright updated CONNECTORS-145:
-----------------------------------
Component/s: Framework crawler agent
Description:
The expire stuffer thread puts documents from both the PENDING and
PENDINGPURGATORY states into the ACTIVE and ACTIVEPURGATORY states. The expire
threads should deal with errors by moving the candidate document back into its
original state for a later expiration attempt, but right now the Expire Thread
simply blocks and retries, essentially using up an expire thread for the
duration of the outage.
In a time when there was only one output connection, this was acceptable logic,
but since there are multiple such connections possible now, it is a potential
liability.
Thus, logic must change in the Expire Thread to perform the appropriate error
recovery.
The other thread family that has this problem is the Document Delete Thread
family. These threads will require some thought to fix because there is
currently no deletion scheduling field in the jobqueue database table, and yet
we'd need one if we were going to fix this problem in an appropriate manner.
However, we can probably reuse the checktime field for this purpose if we are
clever.
was:
The expire stuffer thread puts documents from both the PENDING and
PENDINGPURGATORY states into the BEINGDELETED state. But this state is
insufficient to handle proper error recovery in the case of a down search
engine blocking expiration. What should happen is that the error from the
failed expiration attempt should move the candidate document back into its
original state for a later expiration attempt, but right now the Expire Thread
simply blocks and retries, essentially using up an expire thread for the
duration of the outage.
In a time when there was only one output connection, this was acceptable logic,
but since there are multiple such connections possible now, it is a potential
liability.
The way forward is to introduce a new document state, BEINGDELETEDPURGATORY,
and insure that the mapping between PENDING->BEINGDELETED and
PENDINGPURGATORY->BEINGDELETEDPURGATORY is preserved. Then, logic must change
in the Expire Thread to perform the appropriate error recovery.
The other thread family that has this problem is the Document Delete Thread
family. These threads will require some thought to fix because there is
currently no deletion scheduling field in the jobqueue database table, and yet
we'd need one if we were going to fix this problem in an appropriate manner.
However, we can probably reuse the checktime field for this purpose if we are
clever.
Summary: The logic for dealing with a downed Search Engine in
ExpireThread is not optimal (was: The BEINGDELETED document state is
insufficient to allow proper error recovery from a search engine that's down)
> The logic for dealing with a downed Search Engine in ExpireThread is not
> optimal
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-145
> URL: https://issues.apache.org/jira/browse/CONNECTORS-145
> Project: ManifoldCF
> Issue Type: Bug
> Components: Framework crawler agent
> Reporter: Karl Wright
>
> The expire stuffer thread puts documents from both the PENDING and
> PENDINGPURGATORY states into the ACTIVE and ACTIVEPURGATORY states. The
> expire threads should deal with errors by moving the candidate document back
> into its original state for a later expiration attempt, but right now the
> Expire Thread simply blocks and retries, essentially using up an expire
> thread for the duration of the outage.
> In a time when there was only one output connection, this was acceptable
> logic, but since there are multiple such connections possible now, it is a
> potential liability.
> Thus, logic must change in the Expire Thread to perform the appropriate error
> recovery.
> The other thread family that has this problem is the Document Delete Thread
> family. These threads will require some thought to fix because there is
> currently no deletion scheduling field in the jobqueue database table, and
> yet we'd need one if we were going to fix this problem in an appropriate
> manner. However, we can probably reuse the checktime field for this purpose
> if we are clever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.