So I am curious about using elixir to build a distributed cron system. Our 
platform runs user defined “flows” from a variety of IoT devices/services 
(think Nest, SmartThings, Lifx, Fitbit, etc) as well as digital services 
(Twitter, Facebook, etc). Here is an example: When it’s 9am or I turn my 
car on at the house, dim my lights, turn down the thermostat only if the 
outside temperature is below 60 otherwise leave thermostat at current 
level. 


We need to keep track of these time sensitive “jobs” and we do so by having 
our data router send the cron job to a "scheduler" node when one of our 
brokers (each integration  e.g. Nest, Facebook, etc has its own broker or 
group of brokers) sends a new request in. This scheduler node schedules it 
using node-crontab (basically a snapshot of the data it sends to the 
processing engine). We currently have thousands of jobs per node in memory, 
unfortunately when a node goes down so do its jobs. When the cron job needs 
to run the payload in memory is sent for processing and execution. We are 
trying to think through a way to have another node take over a failed nodes 
job (first thinking through how a node or group of scheduler nodes get 
notified that this has happened and to which node.) without having to check 
all the defined jobs in the central k/v store (redis) and take all the jobs 
belonging to the failed node (when jobs come in part of the job key value 
is the hash of the scheduler node it was sent to).


A couple of  people are leaning towards a zookeeper master / slave system 
for this to solve notifications but we are still faced with how to quickly 
have another node take over a failed nodes jobs. Anyways, I and another 
person have deployed elixir for a few semi-critical services but nothing 
like what we might need to build a cron system like this. Erlang/OTP/Elixir 
seem a perfect fit perhaps with each node writing keeping jobs a local 
Agent while also writing it to mnesia or perhaps for a cluster of nodes 
keeping track of jobs on mnesia and when a node goes down another one can 
take over the jobs for that node by grabbing them from mnesia and writing 
to it’s cron. I don’t have much experience with these types of systems but 
it seems like a natural fit for Elixir and the OTP model.  Any advice or 
guidance would be very welcome. 


For an example here is something kind of like what we want in golang 
http://dkron.io/ 

but as far as I can tell without robust failover mechanisms. 


Thanks,


Dan 

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-talk/3463be1f-545b-4864-b139-016e4841fb18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to