Yes, if mcf receives a 404 response it will delete the document from the index.
Continuous crawling though means the document may not be retried for a long time. Exponential back off is used. Karl On Tue, Sep 3, 2019, 1:36 AM Priya Arora <pr...@smartshore.nl> wrote: > Yes its a continuous Job. > > On Tue, Sep 3, 2019 at 11:05 AM Priya Arora <pr...@smartshore.nl> wrote: > > > Hi , > > I am having a job Job:-myuniversity_intranet (which is crawling data from > > intranet site) and the data has been indexed in an index. > > My query here is, does manifold have some functionality to test a url > > before indexing that whether the URL is existing or not?. > > Likewise , in my index (say index name: abc), i am having URL(indexed). > > URL:- https:myuniversity/reaserch/info(which is an intranet url). This > URL > > was existing earlier but not existing now, and resulting status is 404. > > > > Query is :- Can monifoldcf checks before indexing whether its status is > > not equal to 404(that means it exists). if the URL exists in real only > then > > index otherwise skip that URL. > > Does this setting can be implemented while configuring manifold cf job., > > or do I have to manually handle this in code. > > > > > > Kind regards > > Priya > > > > On Mon, Sep 2, 2019 at 8:19 PM Karl Wright <daddy...@gmail.com> wrote: > > > >> Hi, > >> You aren't giving me enough information to know why your job isn't > >> rechecking URLs. Please tell me how your job is configured, > specifically > >> whether it's continuous or not. Thanks. > >> > >> Karl > >> > >> > >> On Mon, Sep 2, 2019 at 4:47 AM Priya Arora <pr...@smartshore.nl> wrote: > >> > >> > Hi, > >> > > >> > I have a query regarding manifoldCF. Is this having some kind of > >> > functionality to check, if the URL it is crawling, does exist actually > >> or > >> > page not found(404). > >> > > >> > Like I have a requirement in which i am crawling data for university > and > >> > job i continuously running.After some period it found that the certain > >> > URL's have been removed from University site but its is getting > indexed > >> > still also. > >> > > >> > Some pages have been marked as status 404. > >> > How can manifold be automatise to check this , that if the URL is > >> > corresponding to 404(does not exist anymore), it should be indexed > >> > > >> > Thanks > >> > Priya. > >> > > >> > > >