Hi Shinichiro, All of ManifoldCF's state information is in the database, which maintains consistency because it is ACID. You can stop the ManifoldCF agents process and start it up again, and the crawl will begin where it stopped. The framework has been very carefully designed to not get confused in any way when this is done. This resilience is in fact one of the primary design criteria of ManifoldCF.
Exactly how crawls are done is covered in ManifoldCF in Action, chapters 11 and 12. I'll send those to you privately. Thanks, Karl On Thu, Jun 16, 2011 at 7:09 PM, Shinichiro Abe <[email protected]> wrote: > Hi. > Please let me know about resume mechanism. > > For example, when job is executing, the following things happen. > MCF services stop, Solr shutdown, repository servers shutdown. > The job can not connect eace connectors by shutdown, it stops to ingest > documents. > But when the above things are recovered, the job starts to resume ingesting, > it can keep crawling consistency. > What manages it? Does jobqueue manage this resume mechanism? > > If so, are there cases that job can not keep crawling consistency? > e.g. the following cases. > a)Postgresql stops before inserting all into jobqueue, jobqueue data is > short and inconsistent. > b)Though it needs to crawl a lot of documents, MCF stops before inserting > all into jobqueue. As a result, jobqueue data is short and inconsistent. > c)Any other cases. > I want to know the possibility that data is inconsistent by halfway > interrupting when crawling. > > Also I want to read Part4 MCF architecture on ManifoldCFinAction. > Regards, > Shinichiro Abe
