If i recall correctly, the current implementation of the mesos log requires that the callers handle mutually-exclusive access for reads and writes. This means that non-leading schdulers may not read or write to perform the check you describe.
What's the behavior of the scheduler when it starts and the log replica is non-VOTING? I thought the log open() call would fail, and the scheduler process would exit (giving a strong signal that the scheduler is not healthy). On Fri, Jun 17, 2016 at 2:44 AM, Martin Hrabovčin < martin.hrabov...@gmail.com> wrote: > Hello, > > I was asking same question in #aurora channel and I still haven't found an > answer so I am bringing this in mailing list with a proposal. > > Is there a way to check the state of mesos-log (whether the its writable in > VOTING state) through some HTTP check outside of aurora process on a > non-leading aurora instance? We are trying to create external check that > would determine whether the mesos-log is ready in case of aurora rolling > update. When adding new instance to existing aurora cluster and we want to > make sure that mesos-log is replicated and replica is ready to serve reads > and writes. Currently we’re grep-ing java process log and looking for > “Persisted replica status to VOTING”. > > I was pointed to /vars endpoint but I haven't found obvious answer there. > > I'd like to propose creating new HTTP endpoint "/loghealth" that would > similarly to "/leaderhealth" return 200 when mesos-log is ready and 503 in > case when mesos log throws exception. As for implementation I was thinking > about doing simple read from log or write noop to log directly. > > Thanks! >