The timeout behavior sounds like a dangerous scalability tripwire. Consider
revisiting that approach.
On Sun, Oct 28, 2018 at 10:42 PM Varun Gupta
wrote:
> Mesos Version: 1.6
>
> scheduler has 250k events in its queue: Master master sends status updates
> to scheduler, and scheduler stores them in the queue. Scheduler process in
> FIFO, and once processed (includes persisting to DB) it ack the update.
> These updates are processed asynchronously with a thread pool of 1000 size.
> We are using explicit reconciliation.
> If the ack to Mesos Master is timing out, due to high CPU usage then next
> ack will likely fail too. It slows down processing on Scheduler side,
> meanwhile Mesos Master continuous to send status updates (duplicate ones,
> since old status updates are not ack). This leads to building up of status
> updates at Scheduler to be processed, and we have seen it to grow upto 250k
> status updates.
>
> Timeout is the explicit ack request from Scheduler to Mesos Master.
>
> Mesos Master profiling: Next time, when this issue occurs, I will take the
> dump.
>
> Deduplication is for the status updates present in the queue for scheduler
> to process, idea is to dedup duplicate status updates such that scheduler
> only processes same status update pending in queue once, and ack to Mesos
> Master also ones. It will reduce the load for both Scheduler and Mesos
> Master. After the ack (success/fail) scheduler will remove the status
> update from the queue, and in case of failure, Mesos Master will send
> status update again.
>
>
>
> On Sun, Oct 28, 2018 at 10:15 PM Benjamin Mahler
> wrote:
>
> > Which version of mesos are you running?
> >
> > > In framework, event updates grow up to 250k
> >
> > What does this mean? The scheduler has 250k events in its queue?
> >
> > > which leads to cascading effect on higher latency at Mesos Master (ack
> > requests with 10s timeout)
> >
> > Can you send us perf stacks of the master during such a time window so
> > that we can see if there are any bottlenecks?
> > http://mesos.apache.org/documentation/latest/performance-profiling/
> >
> > Where is this timeout coming from and how is it used?
> >
> > > simultaneously explore if dedup is an option
> >
> > I don't know what you're referring to in terms of de-duplication. Can you
> > explain how the scheduler's status update processing works? Does it use
> > explicit acknowledgements and process batches asynchronously? Aurora
> > example: https://reviews.apache.org/r/33689/
> >
> > On Sun, Oct 28, 2018 at 8:58 PM Varun Gupta
> > wrote:
> >
> >> Hi Benjamin,
> >>
> >> In our batch workload use case, number of tasks churn is pretty high. We
> >> have seen 20-30k tasks launch within 10 second window and 100k+ tasks
> >> running.
> >>
> >> In framework, event updates grow up to 250k, which leads to cascading
> >> effect on higher latency at Mesos Master (ack requests with 10s timeout)
> >> as
> >> well as blocking framework to process new since there are too many left
> to
> >> be acknowledged.
> >>
> >> Reconciliation is every 30 mins which also adds pressure on event stream
> >> if
> >> too many unacknowledged.
> >>
> >> I am thinking to experiment with default backoff period from 10s -> 30s
> or
> >> 60s, and simultaneously explore if dedup is an option.
> >>
> >> Thanks,
> >> Varun
> >>
> >> On Sun, Oct 28, 2018 at 6:49 PM Benjamin Mahler
> >> wrote:
> >>
> >> > Hi Varun,
> >> >
> >> > What problem are you trying to solve precisely? There seems to be an
> >> > implication that the duplicate acknowledgements are expensive. They
> >> should
> >> > be low cost, so that's rather surprising. Do you have any data related
> >> to
> >> > this?
> >> >
> >> > You can also tune the backoff rate on the agents, if the defaults are
> >> too
> >> > noisy in your setup.
> >> >
> >> > Ben
> >> >
> >> > On Sun, Oct 28, 2018 at 4:51 PM Varun Gupta wrote:
> >> >
> >> > >
> >> > > Hi,
> >> > >>
> >> > >> Mesos agent will send status updates with exponential backoff until
> >> ack
> >> > >> is received.
> >> > >>
> >> > >> If processing of events at framework and sending ack to Master is
> >> > running
> >> > >> slow then it builds a back pressure at framework due to duplicate
> >> > updates
> >> > >> for same status.
> >> > >>
> >> > >> Has someone explored the option to dedup same status update event
> at
> >> > >> framework or is it even advisable to do. End goal is to dedup all
> >> events
> >> > >> and send only one ack back to Master.
> >> > >>
> >> > >> Thanks,
> >> > >> Varun
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> >
>