[ 
https://issues.apache.org/jira/browse/CONNECTORS-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13599873#comment-13599873
 ] 

Karl Wright commented on CONNECTORS-663:
----------------------------------------

I'm thinking that the basic new job cycle will be called a "minimal" cycle, and 
will consist of:

- Seeding
- Processing the seeded documents, and all discovered documents

So, for a MODEL_ADD connector, only additions will be crawled.  For a 
MODEL_ADD_CHANGE connector, additions and modifications will be crawled, etc.  
The UI will have a "Start minimal" clickable link in addition to a "Start" link.


                
> ManifoldCF needs the ability to not always check for deletion on a crawl
> ------------------------------------------------------------------------
>
>                 Key: CONNECTORS-663
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-663
>             Project: ManifoldCF
>          Issue Type: New Feature
>          Components: Framework agents process
>    Affects Versions: ManifoldCF 1.1.1
>            Reporter: Karl Wright
>            Assignee: Karl Wright
>             Fix For: ManifoldCF next
>
>
> The ManifoldCF framework's crawling model always brings the index in synch 
> with the repository by the end of the job.  Unfortunately, for many 
> repositories, the incremental nature of ManifoldCF is lost in part because 
> deletion tracking is not done by the repository.  ManifoldCF could therefore 
> benefit by the ability to have two different job run cycles: (1) A full run, 
> as is done now, and (2) a partial run, which does not necessarily attempt to 
> clean up deletions.  This of course only makes sense if subsequent job runs 
> have the ability to do the deletion cleanup.
> In principle, I believe this can can work but has significant implications in 
> the following areas:
> - Job states - there needs to be a new set of job states corresponding to 
> which type of job run is selected;
> - UI - there needs to be a way of telling ManifoldCF what kind of job run is 
> desired;
> - API - same problem as UI;
> - Job scheduling; we need the ability to determine what kind of job run is 
> done when, which also has schema implications

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to