Yes its a continuous Job. On Tue, Sep 3, 2019 at 11:05 AM Priya Arora <pr...@smartshore.nl> wrote:
> Hi , > I am having a job Job:-myuniversity_intranet (which is crawling data from > intranet site) and the data has been indexed in an index. > My query here is, does manifold have some functionality to test a url > before indexing that whether the URL is existing or not?. > Likewise , in my index (say index name: abc), i am having URL(indexed). > URL:- https:myuniversity/reaserch/info(which is an intranet url). This URL > was existing earlier but not existing now, and resulting status is 404. > > Query is :- Can monifoldcf checks before indexing whether its status is > not equal to 404(that means it exists). if the URL exists in real only then > index otherwise skip that URL. > Does this setting can be implemented while configuring manifold cf job., > or do I have to manually handle this in code. > > > Kind regards > Priya > > On Mon, Sep 2, 2019 at 8:19 PM Karl Wright <daddy...@gmail.com> wrote: > >> Hi, >> You aren't giving me enough information to know why your job isn't >> rechecking URLs. Please tell me how your job is configured, specifically >> whether it's continuous or not. Thanks. >> >> Karl >> >> >> On Mon, Sep 2, 2019 at 4:47 AM Priya Arora <pr...@smartshore.nl> wrote: >> >> > Hi, >> > >> > I have a query regarding manifoldCF. Is this having some kind of >> > functionality to check, if the URL it is crawling, does exist actually >> or >> > page not found(404). >> > >> > Like I have a requirement in which i am crawling data for university and >> > job i continuously running.After some period it found that the certain >> > URL's have been removed from University site but its is getting indexed >> > still also. >> > >> > Some pages have been marked as status 404. >> > How can manifold be automatise to check this , that if the URL is >> > corresponding to 404(does not exist anymore), it should be indexed >> > >> > Thanks >> > Priya. >> > >> >