I was able to reproduce this; CONNECTORS-1242. Karl
On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <[email protected]> wrote: > I'm interested in the time it is supposed to be processed, actually. > > I'm trying to recreate your example here to see if I can get more > information. > > Karl > > > > On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall < > [email protected]> wrote: > >> The document is in a state of 'Processed' and the status is 'Ready for >> processing' >> >> -----Original Message----- >> From: Karl Wright [mailto:[email protected]] >> Sent: 17 September 2015 5:28 >> To: dev >> Subject: Re: Potential Issue with pausing jobs >> >> When it is in the state after the job has resumed, can you do a Document >> Status report and tell me what that says for your document? >> >> Thanks, >> Karl >> >> >> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall < >> [email protected]> wrote: >> >> > Hi Karl, >> > >> > Thanks for that. I think the problem might be more fundamental. When I >> > start my job and monitor the simple job history I can see the job doing >> > things like: >> > >> > Run the seed query >> > Run the data query >> > Run the seed query >> > Run the data query >> > >> > Etc. >> > >> > It continues to do this indefinitely from what I have observed. As soon >> as >> > I pause and resume the job, all I can see in the simple job history is: >> > >> > Run the seed query >> > Run the seed query >> > Run the seed query >> > >> > It's like it's never going to run the data query again? >> > >> > Kind Regards, >> > >> > Niall >> > >> > -----Original Message----- >> > From: Karl Wright [mailto:[email protected]] >> > Sent: 17 September 2015 4:53 >> > To: dev >> > Subject: Re: Potential Issue with pausing jobs >> > >> > Hi Niall, >> > >> > A continuous job reseeds on a schedule, which you set as part of the job >> > setup. For a continuous job, if the document has been crawled, it will >> be >> > recrawled again at a specific time in the future, and if at that time it >> > hasn't changed, it will be scheduled for checking again even further >> out, >> > up to a certain limit (also settable within the job). >> > >> > You can look at the document's schedule, by the way, using the "Document >> > Status" report, and it should be pretty clear from that what should >> happen >> > and when. >> > >> > When you abort the job and restart it, everything is reset, so the >> document >> > will be checked immediately at that point, and relatively frequently >> for a >> > while until the system figures out that the document isn't changing very >> > rapidly. >> > >> > Thanks, >> > Karl >> > >> > >> > >> > >> > >> > >> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall < >> > [email protected]> wrote: >> > >> > > Hi Karl, >> > > >> > > You'll have to forgive me if my answer is a bit uncertain but I am >> very >> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC >> > > connector, I am literally just selecting 1 for the id, 'myurl' for the >> > url >> > > and 'mydata' for the data. So there is only ever 1 document being >> > processed. >> > > >> > > So to answer the questions: >> > > >> > > 1. There are 0 active documents on the queue. >> > > 2. Single process >> > > 3. Yes, this is a continuous crawl. >> > > >> > > Kind Regards, >> > > >> > > Niall >> > > >> > > -----Original Message----- >> > > From: Karl Wright [mailto:[email protected]] >> > > Sent: 17 September 2015 4:27 >> > > To: dev >> > > Subject: Re: Potential Issue with pausing jobs >> > > >> > > Hi Niall, >> > > >> > > Pausing and resuming a job should have no effects *other* than >> > > reprioritization of the active documents on the queue, which if there >> > are a >> > > lot of them, may take some time. >> > > >> > > So let's ask some basic questions. (1) How many active documents on >> your >> > > queue? (2) What kind of synchronization are you using? Is this single >> > > process, or multiprocess? (3) Is this a continuous crawl? >> > > >> > > >>>>>> >> > > And on a side note, what is the difference between pausing a job and >> > > aborting a job? >> > > <<<<<< >> > > >> > > I can't fully answer that unless I know the characteristics of your >> job, >> > > especially continuous crawl vs. crawl to completion. >> > > >> > > Karl >> > > >> > > >> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall < >> > > [email protected]> wrote: >> > > >> > > > Hi, >> > > > >> > > > I am experimenting with pausing a job. The job has a simple JDBC >> > > > connection and a null output connection. I was experimenting with >> > pausing >> > > > the job and I notice that when I resume the job, and monitor it's >> > > progress >> > > > in the simple history report, the job never seems to run the data >> query >> > > any >> > > > more. I can see that it runs the seed query but it doesn't progress >> to >> > > the >> > > > data query. If I abort the job and restart it, it does seem to start >> > > > running the data query again. >> > > > >> > > > Can anyone explain this behaviour? And on a side note, what is the >> > > > difference between pausing a job and aborting a job? >> > > > >> > > > Thanks, >> > > > >> > > > Niall >> > > > >> > > >> > >> > >
