> On Feb. 27, 2014, 7:09 p.m., Ben Mahler wrote: > > Hey Dominic, I found an issue applying these Option patches, looks like > > slave.cpp:3001 ( > > https://github.com/apache/mesos/blob/20a6993e1b50bf46089cd5e059634ac446b08ac2/src/slave/slave.cpp#L3001 > > ) is problematic: > > > > struct ExecutorState > > { > > ExecutorID id; > > Option<ExecutorInfo> info; > > Option<ContainerID> latest; > > hashmap<ContainerID, RunState> runs; > > unsigned int errors; > > }; > > > > void Framework::recoverExecutor(const ExecutorState& state) > > { > > ... > > const RunState& run = state.runs.get(latest).get(); > > <----- rvalue -------> > > < lvalue of temp Option::t > > > } > > > > I'm not 100% clear on what the standard specifies here, but it doesn't look > > like we're extending the life of a temporary here, since we're grabbing a > > _member_ from a temporary. > > > > [ RUN ] SlaveRecoveryTest/0.RecoverStatusUpdateManager > > Detaching after fork from child process 51693. > > Detaching after fork from child process 51694. > > Detaching after fork from child process 51708. > > Detaching after fork from child process 51709. > > WARNING: Logging before InitGoogleLogging() is written to STDERR > > I0227 18:46:06.952373 51709 exec.cpp:131] Version: 0.19.0 > > I0227 18:46:06.956939 51742 exec.cpp:205] Executor registered on slave > > 2014-02-27-18:46:06-1740121354-59276-51387-0 > > Registered executor on smfd-bkq-03-sr4.devel.twitter.com > > Starting task 9fe870d9-4b47-489f-b863-2a929ade31a8 > > sh -c 'sleep 1000' > > Forked command at 51749 > > > > Program received signal SIGSEGV, Segmentation fault. > > [Switching to Thread 0x7fffecb03940 (LWP 51403)] > > 0x0000000000a09656 in > > boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<mesos::TaskID > > const, mesos::internal::slave::state::TaskState> >, mesos::TaskID, > > mesos::internal::slave::state::TaskState, boost::hash<mesos::TaskID>, > > std::equal_to<mesos::TaskID> > >::begin (this=0x7fffd803d810) at > > ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/unordered/detail/table.hpp:249 > > 249 return size_ ? iterator(get_previous_start()->next_) : > > iterator(); > > (gdb) where > > #0 0x0000000000a09656 in > > boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<mesos::TaskID > > const, mesos::internal::slave::state::TaskState> >, mesos::TaskID, > > mesos::internal::slave::state::TaskState, boost::hash<mesos::TaskID>, > > std::equal_to<mesos::TaskID> > >::begin (this=0x7fffd803d810) at > > ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/unordered/detail/table.hpp:249 > > #1 0x00007ffff62deccb in boost::unordered::unordered_map<mesos::TaskID, > > mesos::internal::slave::state::TaskState, boost::hash<mesos::TaskID>, > > std::equal_to<mesos::TaskID>, std::allocator<std::pair<mesos::TaskID const, > > mesos::internal::slave::state::TaskState> > >::begin (this=0x7fffd803d810) > > at > > ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/unordered/unordered_map.hpp:209 > > #2 0x00007ffff62decf7 in > > boost::range_detail::range_begin<hashmap<mesos::TaskID, > > mesos::internal::slave::state::TaskState> const> (c=...) > > at ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/range/begin.hpp:49 > > #3 0x00007ffff62ded0f in > > boost::range_adl_barrier::begin<hashmap<mesos::TaskID, > > mesos::internal::slave::state::TaskState> > (r=...) at > > ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/range/begin.hpp:119 > > #4 0x00007ffff62ded37 in > > boost::foreach_detail_::begin<hashmap<mesos::TaskID, > > mesos::internal::slave::state::TaskState>, mpl_::bool_<true> > (col=...) > > at ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/foreach.hpp:674 > > #5 0x00007ffff62a4868 in > > mesos::internal::slave::Framework::recoverExecutor (this=0x7fffd804d6c0, > > state=...) at ../../src/slave/slave.cpp:3025 > > #6 0x00007ffff62aa668 in mesos::internal::slave::Slave::recoverFramework > > (this=0x7fffdc01c230, state=...) at ../../src/slave/slave.cpp:2767 > > #7 0x00007ffff62ab115 in mesos::internal::slave::Slave::recover > > (this=0x7fffdc01c230, _state=...) at ../../src/slave/slave.cpp:2576 > > #8 0x00007ffff62c77c4 in std::tr1::_Mem_fn<process::Future<Nothing> > > (mesos::internal::slave::Slave::*)(Result<mesos::internal::slave::state::SlaveState> > > const&)>::operator() (this=0x7fffe0036a40, > > __object=0x7fffdc01c230, __a1=...) at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:214 > > #9 0x00007ffff62cbdb1 in > > std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<Nothing> > > (mesos::internal::slave::Slave::*)(const > > Result<mesos::internal::slave::state::SlaveState>&)> > > ()(std::tr1::_Placeholder<1>, > > Result<mesos::internal::slave::state::SlaveState>)>::operator()<mesos::internal::slave::Slave*>(mesos::internal::slave::Slave > > *&) (this=0x7fffe0036a40, __u1=@0x7fffecb02d40) > > at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45 > > #10 0x00007ffff62cbdf1 in > > std::tr1::_Function_handler<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*),std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<Nothing> > > (mesos::internal::slave::Slave::*)(const > > Result<mesos::internal::slave::state::SlaveState>&)> > > ()(std::tr1::_Placeholder<1>, > > Result<mesos::internal::slave::state::SlaveState>)> >::_M_invoke(const > > std::tr1::_Any_data &, mesos::internal::slave::Slave *) (__functor=..., > > __a1=0x7fffdc01c230) at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:488 > > #11 0x00007ffff62e1b4d in std::tr1::function<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*)>::operator()(mesos::internal::slave::Slave > > *) const (this=0x7fffe0036760, __a1=0x7fffdc01c230) > > at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:868 > > #12 0x00007ffff62f8a5d in process::internal::pdispatcher<Nothing, > > mesos::internal::slave::Slave>(process::ProcessBase *, > > std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*)> >, > > std::tr1::shared_ptr<process::Promise<Nothing> >) (process=0x7fffdc01c590, > > thunk=std::tr1::shared_ptr (count 2) 0x7fffecb02e50, > > promise=std::tr1::shared_ptr (count -536638560) 0x7fffecb02e30) > > at ../../3rdparty/libprocess/include/process/dispatch.hpp:88 > > #13 0x00007ffff62eedd5 in std::tr1::_Bind<void (* > > ()(std::tr1::_Placeholder<1>, > > std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*)> >, > > std::tr1::shared_ptr<process::Promise<Nothing> >))(process::ProcessBase*, > > std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*)> >, > > std::tr1::shared_ptr<process::Promise<Nothing> > > >)>::operator()<process::ProcessBase*>(process::ProcessBase *&) > > (this=0x7fffe002d340, __u1=@0x7fffecb02ea0) at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45 > > #14 0x00007ffff62eee4d in std::tr1::_Function_handler<void > > ()(process::ProcessBase*),std::tr1::_Bind<void (* > > ()(std::tr1::_Placeholder<1>, > > std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*)> >, > > std::tr1::shared_ptr<process::Promise<Nothing> >))(process::ProcessBase*, > > std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing> > > ()(mesos::internal::slave::Slave*)> >, > > std::tr1::shared_ptr<process::Promise<Nothing> >)> >::_M_invoke(const > > std::tr1::_Any_data &, process::ProcessBase *) (__functor=..., > > __a1=0x7fffdc01c590) > > at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502 > > #15 0x00007ffff66bcded in std::tr1::function<void > > ()(process::ProcessBase*)>::operator()(process::ProcessBase *) const > > (this=0x7fffe0035500, __a1=0x7fffdc01c590) > > at > > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:868 > > #16 0x00007ffff6679593 in process::ProcessBase::visit (this=0x7fffdc01c590, > > event=...) at ../../../3rdparty/libprocess/src/process.cpp:3183 > > #17 0x00007ffff6690e3e in process::DispatchEvent::visit > > (this=0x7fffe0035580, visitor=0x7fffdc01c590) at > > ../../../3rdparty/libprocess/include/process/event.hpp:136 > > #18 0x000000000066b1fa in process::ProcessBase::serve (this=0x7fffdc01c590, > > event=...) at ../../3rdparty/libprocess/include/process/process.hpp:38 > > #19 0x00007ffff66840f4 in process::ProcessManager::resume (this=0x10c41d0, > > process=0x7fffdc01c590) at ../../../3rdparty/libprocess/src/process.cpp:2611 > > #20 0x00007ffff6684948 in process::schedule (arg=0x0) at > > ../../../3rdparty/libprocess/src/process.cpp:1305 > > #21 0x00007ffff4ffe83d in start_thread () from /lib64/libpthread.so.0 > > #22 0x00007ffff3d6526d in clone () from /lib64/libc.so.6 > > Dominic Hamon wrote: > If i read this right, the hashmap is returning a copy of an Option<T> and > then we call get() on that which should be returning a const RunState&. There > shouldn't be any need to extend any lifetime. > > If the hashmap was returning a reference to an Option, I can see the > problem. What am I missing? > > Ben Mahler wrote: > Previously, we were binding to an rvalue RunState, as Option::get was > returning a copy. The standard guarantees that the lifetime of this RunState > temporary would be extended (8.5.3 of the standard). > > Now Option::get is returning an rvalue of a RunState&, this reference > lives inside the intermediate temporary Option in the expression. The > lifetime of this Option _must_ be extended for the RunState reference to > remain valid. But, I don't think the standard ensure this since the Option > rvalue is an intermediate part of the expression. 12.2 of the standard is > interesting here, but it's important to note that get() is returning a > _reference_, not an object. > > Dominic Hamon wrote: > the fix for the callsite is to put the Option into an explicit temporary > as a copy, however this makes me uncomfortable as the error is a runtime one. > Ie, it's not obvious to someone that writing it as written is invalid. > > Another option is to tighten up hashmap so that get returns the value by > reference, not an Option, and it's illegal to call 'get' when the key doesn't > exist. Similar to C++11's map::at() method. > > The final option is to not change the Option return to a reference.
What about just making a copy in the caller? :) const RunState& run = state.runs.get(latest).get(); becomes: RunState run = state.runs.get(latest).get(); const RunState run = state.runs.get(latest).get(); // If const semantics are strongly desired. - Ben ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18386/#review35673 ----------------------------------------------------------- On Feb. 27, 2014, 5:12 p.m., Dominic Hamon wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18386/ > ----------------------------------------------------------- > > (Updated Feb. 27, 2014, 5:12 p.m.) > > > Review request for mesos and Ben Mahler. > > > Bugs: MESOS-1008 > https://issues.apache.org/jira/browse/MESOS-1008 > > > Repository: mesos-git > > > Description > ------- > > See summary > > > Diffs > ----- > > src/linux/fs.hpp 1d86dd0d24c3daae957b5eec387638d1e8e6d7db > src/linux/fs.cpp e5f4f9a16becd4e5960d0cbb7f988736188b2426 > src/log/log.cpp 62dc9286285d5981aa5fc63e125d3c3f51d4f457 > src/sched/sched.cpp dcb3158d2acd93a107fda51b19e92899a092e628 > > Diff: https://reviews.apache.org/r/18386/diff/ > > > Testing > ------- > > make check > > 'grep' for cases where Options are reassigned after references are taken. > > > Thanks, > > Dominic Hamon > >