Stuffer query will perform poorly under some conditions -------------------------------------------------------
Key: CONNECTORS-290 URL: https://issues.apache.org/jira/browse/CONNECTORS-290 Project: ManifoldCF Issue Type: Bug Components: Framework agents process Affects Versions: ManifoldCF 0.3, ManifoldCF 0.2, ManifoldCF 0.1, ManifoldCF 0.4 Reporter: Karl Wright Assignee: Karl Wright Fix For: ManifoldCF 0.4 The stuffer query, which returns documents in index order by docpriority for processing, performs poorly when lots of documents are in the queue and have a good priority but can't be taken because of job state. This can happen when: (1) a large job is aborted, leaving lots of jobqueue records with docpriority values around; (2) a job is paused for an extended period of time, while others are running. In the second case, when the paused job is resumed, there's an added problem because, for a while, only documents from the paused job will be processed. The answer to (1) may well be to clean out all docpriority values on job abort. Right now there is no logic that sets docpriority values to null, but there clearly needs to be, or the docpriority index will remain polluted with rows that must be scanned but cannot be used for an extended period of time. The "correct" answer to (2) is to clear out docpriority values when a job is paused, and then redo them all when the job is resumed. Similarly, docpriority values should be set for all of a job's documents when a job is started, and should be nulled out when documents enter non-active states. The former currently occurs, but not the latter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira