Re: Performance working group meeting

Benjamin Mahler Wed, 26 Jul 2017 19:08:10 -0700

Thanks to those who joined! Notes were taken in the doc, I'll inline them
below for easier consumption. I'll also be looking into getting the video
published on the YouTube channel.


Notes:

-Attendee introductions: Benjamin Hindman, Chun-Hung Hsiao, Benjamin
Mahler, Greg Mann, James Peach, Ilya Pronin, Dario Rexin, Alexander
Rukletsov, Yan Xu, Dmitry Zhuk

-Ongoing libprocess optimizations
  -Benh: have been doing message passing optimizations in libprocess,
MESOS-7798 (includes flamegraphs)
  -Added a benchmark based on Akka benchmark
  -Done in several phases, some introduce new paths that are opt-in
    -Phase 1: Opt in lock-free run queue (at configure time) and other
optimizations
    -Phase 2: Opt in lock-free event queue (at configure time) and other
optimizations
    -Phase 3: On linux, sempahore wasn’t giving good performance, added a
fixed size LIFO semaphore (flamegraphs show with and w/o LIFO semaphore)
    -Plan is to make lock free the default once users have used it in test
/ prod
  -A lot more to do in libprocess, e.g. http request path optimization

-Faster master failover work
  -Other libprocess optimizations done by Dmitry to help with this
MESOS-7713. Lots of copying of data (e.g. TaskInfo, etc) (something like 7
times). Protobuf move support is coming in one of the next protobuf
releases, but not available yet. Use moves to avoid all of these copies
across defer / dispatch boundaries. Two sets of patches;
    -Starting at r/60003: improves number of copies made via dispatching.
Mpark helping with this, wanted to use something that requires c++14 (hence
the mailing list proposal).
    -Starting at r/60474: improves copies in the protobuf message input
path. Using arena means no moves.
    -Move support said to be in protobuf 3.4.0: https://github.com/google/
protobuf/issues/2791

-Discuss performance related pain points
  -Master failover performance
  -Libprocess http path
  -Master metrics are slow due to tripping through the actors’ queues
  -Master's HTTP API has lower throughput than old API.
  -Webui is slow to respond for large cluster / clusters with a lot of state
  -/state is slow to get a response (How long to generate within master’s
context? Also a lot of data to send over the wire?)

Fill out “planning” table:
  -Libprocess lock free event queue: benh (writing patches) bmahler
(shepherding)
  -Move support for dispatch / defer: dmitry / mpark
  -Libprocess protobuf input path performance improvements: dmitry / benh
  -Libprocess http path benchmark / optimizations
  -Use arenas for output protobuf messages in the master
  -HTTP API benchmark / optimizations
  -Webui performance improvements
  -Master failover benchmark (yan)
  -Chun: Using jemalloc (James: we always use jemalloc), can we make this
the default? (this run into JNI problems though, maybe the default with
java disabled?)
  -Dario: Master process is doing a lot, e.g. parsing http bodies, etc.
Yan: Pull out history data in master from the run-time data?


On Fri, Jul 21, 2017 at 8:45 PM, Benjamin Mahler <bmah...@apache.org> wrote:

> I've scheduled the meeting for July 26th 10 am PST on the Apache Mesos
> calendar, this is a bit late for the BST folks but it was the only time
> slot available for myself and benh in the short term. Let me know if this
> doesn't work for a lot of you, so that we can reschedule if needed.
>
> Here is the agenda doc, I will update the agenda and the calendar entry
> with the Zoom details when I have those sorted out:
> https://docs.google.com/document/d/12hWGuzbqyNWc2l1ysbPcXwc0
> pzHEy4bodagrlNGCuQU/edit?usp=sharing
>
> On Fri, Jul 21, 2017 at 9:03 AM, Benjamin Mahler <bmah...@apache.org>
> wrote:
>
>> Since there have been several folks working on performance related things
>> lately, I'd like to try to schedule a meeting, this could be recurring if
>> we find it useful,
>>
>> For an agenda, we could discuss:
>>
>> - ongoing work for libprocess optimizations and faster master failovers
>> - existing performance related pain points
>> - what people's priorities are, how much they can contribute
>>
>> Please join the #performance slack channel if you're interested in
>> general, and if you'd like to join the meeting please reply here and
>> include your time zone!
>>
>> Ben
>>
>
>

Re: Performance working group meeting

Reply via email to