Aborting a job, or restarting it, is perfectly safe and will lose no data. As I said before, the difference lies in the fact that pausing does not disrupt the document fetching and seeding schedules, while aborting will disrupt these, and make everything start over schedule-wise.
Karl On Fri, Sep 18, 2015 at 5:31 AM, Colreavy, Niall < [email protected]> wrote: > Hi Karl, > > Thanks for looking into that. In the interim, we are going to abort, > rather than pause the job to circumvent the issue. Just out of curiosity, > what is the difference between aborting the job and pausing the job? We > would just be a little bit concerned that there would be adverse effects > from regularly aborting the job. > > Thanks, > > Niall > > -----Original Message----- > From: Karl Wright [mailto:[email protected]] > Sent: 17 September 2015 5:53 > To: dev > Subject: Re: Potential Issue with pausing jobs > > I was able to reproduce this; CONNECTORS-1242. > > Karl > > > On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <[email protected]> wrote: > > > I'm interested in the time it is supposed to be processed, actually. > > > > I'm trying to recreate your example here to see if I can get more > > information. > > > > Karl > > > > > > > > On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall < > > [email protected]> wrote: > > > >> The document is in a state of 'Processed' and the status is 'Ready for > >> processing' > >> > >> -----Original Message----- > >> From: Karl Wright [mailto:[email protected]] > >> Sent: 17 September 2015 5:28 > >> To: dev > >> Subject: Re: Potential Issue with pausing jobs > >> > >> When it is in the state after the job has resumed, can you do a Document > >> Status report and tell me what that says for your document? > >> > >> Thanks, > >> Karl > >> > >> > >> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall < > >> [email protected]> wrote: > >> > >> > Hi Karl, > >> > > >> > Thanks for that. I think the problem might be more fundamental. When I > >> > start my job and monitor the simple job history I can see the job > doing > >> > things like: > >> > > >> > Run the seed query > >> > Run the data query > >> > Run the seed query > >> > Run the data query > >> > > >> > Etc. > >> > > >> > It continues to do this indefinitely from what I have observed. As > soon > >> as > >> > I pause and resume the job, all I can see in the simple job history > is: > >> > > >> > Run the seed query > >> > Run the seed query > >> > Run the seed query > >> > > >> > It's like it's never going to run the data query again? > >> > > >> > Kind Regards, > >> > > >> > Niall > >> > > >> > -----Original Message----- > >> > From: Karl Wright [mailto:[email protected]] > >> > Sent: 17 September 2015 4:53 > >> > To: dev > >> > Subject: Re: Potential Issue with pausing jobs > >> > > >> > Hi Niall, > >> > > >> > A continuous job reseeds on a schedule, which you set as part of the > job > >> > setup. For a continuous job, if the document has been crawled, it > will > >> be > >> > recrawled again at a specific time in the future, and if at that time > it > >> > hasn't changed, it will be scheduled for checking again even further > >> out, > >> > up to a certain limit (also settable within the job). > >> > > >> > You can look at the document's schedule, by the way, using the > "Document > >> > Status" report, and it should be pretty clear from that what should > >> happen > >> > and when. > >> > > >> > When you abort the job and restart it, everything is reset, so the > >> document > >> > will be checked immediately at that point, and relatively frequently > >> for a > >> > while until the system figures out that the document isn't changing > very > >> > rapidly. > >> > > >> > Thanks, > >> > Karl > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall < > >> > [email protected]> wrote: > >> > > >> > > Hi Karl, > >> > > > >> > > You'll have to forgive me if my answer is a bit uncertain but I am > >> very > >> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC > >> > > connector, I am literally just selecting 1 for the id, 'myurl' for > the > >> > url > >> > > and 'mydata' for the data. So there is only ever 1 document being > >> > processed. > >> > > > >> > > So to answer the questions: > >> > > > >> > > 1. There are 0 active documents on the queue. > >> > > 2. Single process > >> > > 3. Yes, this is a continuous crawl. > >> > > > >> > > Kind Regards, > >> > > > >> > > Niall > >> > > > >> > > -----Original Message----- > >> > > From: Karl Wright [mailto:[email protected]] > >> > > Sent: 17 September 2015 4:27 > >> > > To: dev > >> > > Subject: Re: Potential Issue with pausing jobs > >> > > > >> > > Hi Niall, > >> > > > >> > > Pausing and resuming a job should have no effects *other* than > >> > > reprioritization of the active documents on the queue, which if > there > >> > are a > >> > > lot of them, may take some time. > >> > > > >> > > So let's ask some basic questions. (1) How many active documents on > >> your > >> > > queue? (2) What kind of synchronization are you using? Is this > single > >> > > process, or multiprocess? (3) Is this a continuous crawl? > >> > > > >> > > >>>>>> > >> > > And on a side note, what is the difference between pausing a job and > >> > > aborting a job? > >> > > <<<<<< > >> > > > >> > > I can't fully answer that unless I know the characteristics of your > >> job, > >> > > especially continuous crawl vs. crawl to completion. > >> > > > >> > > Karl > >> > > > >> > > > >> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall < > >> > > [email protected]> wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > I am experimenting with pausing a job. The job has a simple JDBC > >> > > > connection and a null output connection. I was experimenting with > >> > pausing > >> > > > the job and I notice that when I resume the job, and monitor it's > >> > > progress > >> > > > in the simple history report, the job never seems to run the data > >> query > >> > > any > >> > > > more. I can see that it runs the seed query but it doesn't > progress > >> to > >> > > the > >> > > > data query. If I abort the job and restart it, it does seem to > start > >> > > > running the data query again. > >> > > > > >> > > > Can anyone explain this behaviour? And on a side note, what is the > >> > > > difference between pausing a job and aborting a job? > >> > > > > >> > > > Thanks, > >> > > > > >> > > > Niall > >> > > > > >> > > > >> > > >> > > > > >
