On 4 August 2014 10:41:04 CEST, Martin Vaeth <mar...@mvath.de> wrote:
>J. Roeleveld <jo...@antarean.org> wrote:
>>
>> With the kind of schedules I am working with (and I believe Alan will
>> also end up with), restarting the whole process from the start can
>> lead to issues.
>> Finding out how far the process got before the service crashed can
>become
>> rather complex.
>
>I am not sure whether I understand this correctly:

The schedules I am used to dealing with easily span 8 - 14 hours with 
occasionally even over a week.
These schedules then also can't be restarted from the beginning when they stop 
halfway through without risking massive consistency problems in the final data.

And then multiple of those starting at random times with occasionally a whole 
bunch of the same schedule put into the queue with dependencies to the previous 
run.

If, during that time, one of the machines has a hardware failure or the 
scheduling process crashes on one or more of the servers, the last state needs 
to be recoverable.

If you have to clean up the environment and bring it back to a state where you 
can restart the schedules, it saves time if you know which commands and tasks 
were actually running at the time.

For this, the schedules, queues and current state for each node needs to be 
stored on persistent storage.

Hope this clarifies it all a bit.
--
Joost


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Reply via email to