[ https://issues.apache.org/jira/browse/MESOS-920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Benjamin Mahler updated MESOS-920: ---------------------------------- Description: We've observed issues where the masters are slow to respond. Two perf traces collected while the masters were slow to respond: {noformat} 25.84% [kernel] [k] default_send_IPI_mask_sequence_phys 20.44% [kernel] [k] native_write_msr_safe 4.54% [kernel] [k] _raw_spin_lock 2.95% libc-2.5.so [.] _int_malloc 1.82% libc-2.5.so [.] malloc 1.55% [kernel] [k] apic_timer_interrupt 1.36% libc-2.5.so [.] _int_free {noformat} {noformat} 29.03% [kernel] [k] default_send_IPI_mask_sequence_phys 9.64% [kernel] [k] _raw_spin_lock 7.38% [kernel] [k] native_write_msr_safe 2.43% libc-2.5.so [.] _int_malloc 2.05% libc-2.5.so [.] _int_free 1.67% [kernel] [k] apic_timer_interrupt 1.58% libc-2.5.so [.] malloc {noformat} These have been found to be attributed to the posix_fadvise calls made by glog. We can disable these via the environment: {noformat} GLOG_DEFINE_bool(drop_log_memory, true, "Drop in-memory buffers of log contents. " "Logs can grow very quickly and they are rarely read before they " "need to be evicted from memory. Instead, drop them from memory " "as soon as they are flushed to disk."); {noformat} {code} if (FLAGS_drop_log_memory) { if (file_length_ >= logging::kPageSize) { // don't evict the most recent page uint32 len = file_length_ & ~(logging::kPageSize - 1); posix_fadvise(fileno(file_), 0, len, POSIX_FADV_DONTNEED); } } {code} We should set GLOG_drop_log_memory=false prior to making our call to google::InitGoogleLogging, to avoid others running into this issue. was: We've observed performance scaling issues attributed to the posix_fadvise calls made by glog. This can currently only disabled via the environment: GLOG_DEFINE_bool(drop_log_memory, true, "Drop in-memory buffers of log contents. " "Logs can grow very quickly and they are rarely read before they " "need to be evicted from memory. Instead, drop them from memory " "as soon as they are flushed to disk."); if (FLAGS_drop_log_memory) { if (file_length_ >= logging::kPageSize) { // don't evict the most recent page uint32 len = file_length_ & ~(logging::kPageSize - 1); posix_fadvise(fileno(file_), 0, len, POSIX_FADV_DONTNEED); } } We should set GLOG_drop_log_memory=false prior to making our call to google::InitGoogleLogging. > Set GLOG_drop_log_memory=false in environment prior to logging initialization. > ------------------------------------------------------------------------------ > > Key: MESOS-920 > URL: https://issues.apache.org/jira/browse/MESOS-920 > Project: Mesos > Issue Type: Improvement > Components: technical debt > Affects Versions: 0.15.0, 0.16.0 > Reporter: Benjamin Mahler > > We've observed issues where the masters are slow to respond. Two perf traces > collected while the masters were slow to respond: > {noformat} > 25.84% [kernel] [k] default_send_IPI_mask_sequence_phys > 20.44% [kernel] [k] native_write_msr_safe > 4.54% [kernel] [k] _raw_spin_lock > 2.95% libc-2.5.so [.] _int_malloc > 1.82% libc-2.5.so [.] malloc > 1.55% [kernel] [k] apic_timer_interrupt > 1.36% libc-2.5.so [.] _int_free > {noformat} > {noformat} > 29.03% [kernel] [k] default_send_IPI_mask_sequence_phys > 9.64% [kernel] [k] _raw_spin_lock > 7.38% [kernel] [k] native_write_msr_safe > 2.43% libc-2.5.so [.] _int_malloc > 2.05% libc-2.5.so [.] _int_free > 1.67% [kernel] [k] apic_timer_interrupt > 1.58% libc-2.5.so [.] malloc > {noformat} > These have been found to be attributed to the posix_fadvise calls made by > glog. We can disable these via the environment: > {noformat} > GLOG_DEFINE_bool(drop_log_memory, true, "Drop in-memory buffers of log > contents. " > "Logs can grow very quickly and they are rarely read before > they " > "need to be evicted from memory. Instead, drop them from > memory " > "as soon as they are flushed to disk."); > {noformat} > {code} > if (FLAGS_drop_log_memory) { > if (file_length_ >= logging::kPageSize) { > // don't evict the most recent page > uint32 len = file_length_ & ~(logging::kPageSize - 1); > posix_fadvise(fileno(file_), 0, len, POSIX_FADV_DONTNEED); > } > } > {code} > We should set GLOG_drop_log_memory=false prior to making our call to > google::InitGoogleLogging, to avoid others running into this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)