[
https://issues.apache.org/jira/browse/MESOS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15843632#comment-15843632
]
James Peach edited comment on MESOS-7017 at 1/27/17 11:10 PM:
--------------------------------------------------------------
Here's the partial stack trace:
{noformat}
#2 0x00007fb830734696 in google::DumpStackTraceAndExit () at
src/utilities.cc:147
#3 0x00007fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458
#4 0x00007fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf
expression opcode 0xf3
) at src/logging.cc:1412
#5 0x00007fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at
src/logging.cc:1281
#6 0x00007fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal
(this=Unhandled dwarf expression opcode 0xf3
) at src/logging.cc:1984
#7 0x00007fb82fb35113 in evolve<mesos::v1::master::Response> (response=...) at
../../src/internal/evolve.cpp:63
#8 mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218
#9 0x00007fb82fba8dd6 in mesos::internal::master::Master::Http::<lambda(const
std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> >&)>::operator()(const
std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> > &) const (__closure=0x7fb720c7a940,
approvers=Unhandled dwarf expression opcode 0xf3
)
at ../../src/master/http.cpp:3772
#10 0x00007fb82fba9068 in operator() (__functor=Unhandled dwarf expression
opcode 0xf3
) at ../../3rdparty/libprocess/include/process/deferred.hpp:225
#11 std::_Function_handler<process::Future<process::http::Response>(),
process::_Deferred<G>::operator std::function<R(P0)>() const::<lambda(P0)>
[with R = process::Future<process::http::Response>; P0 = const
std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> >&; F =
mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&,
const Option<std::basic_string<char> >&, mesos::ContentType)
const::<lambda(const std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> >&)>]::<lambda()> >::_M_invoke(const
std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
#12 0x00007fb82faf73b3 in operator() (__functor=Unhandled dwarf expression
opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
#13 operator() (__functor=Unhandled dwarf expression opcode 0xf3
) at ../../3rdparty/libprocess/include/process/dispatch.hpp:112
#14 std::_Function_handler<void(process::ProcessBase*),
process::internal::Dispatch<process::Future<T> >::operator()(const
process::UPID&, F&&) [with F =
std::function<process::Future<process::http::Response>()>&; R =
process::http::Response]::<lambda(process::ProcessBase*)> >::_M_invoke(const
std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression
opcode 0xf3
)
at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039
{noformat}
So the proximate cause of the crash is that {{evolve}} does a bidirectional
serialization. For large messages this causes 2 large allocations even if is
doesn't trigger the {{CHECK}}.
was (Author: jamespeach):
Here's the partial stack trace:
{noformat}
#2 0x00007fb830734696 in google::DumpStackTraceAndExit () at
src/utilities.cc:147
#3 0x00007fb83072c08d in google::LogMessage::Fail () at src/logging.cc:1458
#4 0x00007fb83072de1d in google::LogMessage::SendToLog (this=Unhandled dwarf
expression opcode 0xf3
) at src/logging.cc:1412
#5 0x00007fb83072bc12 in google::LogMessage::Flush (this=0x7fb8227f3890) at
src/logging.cc:1281
#6 0x00007fb83072e7f9 in google::LogMessageFatal::~LogMessageFatal
(this=Unhandled dwarf expression opcode 0xf3
) at src/logging.cc:1984
#7 0x00007fb82fb35113 in evolve<mesos::v1::master::Response> (response=...) at
../../src/internal/evolve.cpp:63
#8 mesos::internal::evolve (response=...) at ../../src/internal/evolve.cpp:218
#9 0x00007fb82fba8dd6 in mesos::internal::master::Master::Http::<lambda(const
std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> >&)>::operator()(const
std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> > &) const (__closure=0x7fb720c7a940,
approvers=Unhandled dwarf expression opcode 0xf3
)
at ../../src/master/http.cpp:3772
#10 0x00007fb82fba9068 in operator() (__functor=Unhandled dwarf expression
opcode 0xf3
) at ../../3rdparty/libprocess/include/process/deferred.hpp:225
#11 std::_Function_handler<process::Future<process::http::Response>(),
process::_Deferred<G>::operator std::function<R(P0)>() const::<lambda(P0)>
[with R = process::Future<process::http::Response>; P0 = const
std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> >&; F =
mesos::internal::master::Master::Http::getTasks(const mesos::master::Call&,
const Option<std::basic_string<char> >&, mesos::ContentType)
const::<lambda(const std::tuple<process::Owned<mesos::ObjectApprover>,
process::Owned<mesos::ObjectApprover> >&)>]::<lambda()> >::_M_invoke(const
std::_Any_data &) (__functor=Unhandled dwarf expression opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2025
#12 0x00007fb82faf73b3 in operator() (__functor=Unhandled dwarf expression
opcode 0xf3
) at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2439
#13 operator() (__functor=Unhandled dwarf expression opcode 0xf3
) at ../../3rdparty/libprocess/include/process/dispatch.hpp:112
#14 std::_Function_handler<void(process::ProcessBase*),
process::internal::Dispatch<process::Future<T> >::operator()(const
process::UPID&, F&&) [with F =
std::function<process::Future<process::http::Response>()>&; R =
process::http::Response]::<lambda(process::ProcessBase*)> >::_M_invoke(const
std::_Any_data &, process::ProcessBase *) (__functor=Unhandled dwarf expression
opcode 0xf3
)
at /opt/rh/devtoolset-3/root/usr/include/c++/4.9.2/functional:2039
{noformat}
So the proximate cause of the crash is that {{evolve}} does an unnecessary
bidirectional serialization. For large messages this causes 2 unnecessary large
allocations even if is doesn't trigger the {{CHECK}}.
> HTTP API responses can crash the master.
> ----------------------------------------
>
> Key: MESOS-7017
> URL: https://issues.apache.org/jira/browse/MESOS-7017
> Project: Mesos
> Issue Type: Bug
> Components: HTTP API
> Reporter: James Peach
> Priority: Critical
>
> The master can crash when generating large responses to small API requests.
> One manifestation of this is querying the tasks.
> {noformat}
> [libprotobuf ERROR google/protobuf/io/coded_stream.cc:180] A protocol message
> was rejected because it was too big (more than 67108864 bytes). To increase
> the limit (or to disable these warnings), see
> CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
> F0126 18:34:18.790386 26230 evolve.cpp:63] Check failed:
> t.ParsePartialFromString(data) Failed to parse mesos.v1.master.Response while
> evolving from mesos.master.Response
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)