On Thu, Feb 27, 2014 at 1:53 PM, Ben Mahler <benjamin.mah...@gmail.com>wrote:

>
>
> > On Feb. 27, 2014, 7:09 p.m., Ben Mahler wrote:
> > > Hey Dominic, I found an issue applying these Option patches, looks
> like slave.cpp:3001 (
> https://github.com/apache/mesos/blob/20a6993e1b50bf46089cd5e059634ac446b08ac2/src/slave/slave.cpp#L3001)
>  is problematic:
> > >
> > > struct ExecutorState
> > > {
> > >   ExecutorID id;
> > >   Option<ExecutorInfo> info;
> > >   Option<ContainerID> latest;
> > >   hashmap<ContainerID, RunState> runs;
> > >   unsigned int errors;
> > > };
> > >
> > > void Framework::recoverExecutor(const ExecutorState& state)
> > > {
> > >   ...
> > >   const RunState& run = state.runs.get(latest).get();
> > >                         <----- rvalue ------->
> > >                         < lvalue of temp Option::t >
> > > }
> > >
> > > I'm not 100% clear on what the standard specifies here, but it doesn't
> look like we're extending the life of a temporary here, since we're
> grabbing a _member_ from a temporary.
> > >
> > > [ RUN      ] SlaveRecoveryTest/0.RecoverStatusUpdateManager
> > > Detaching after fork from child process 51693.
> > > Detaching after fork from child process 51694.
> > > Detaching after fork from child process 51708.
> > > Detaching after fork from child process 51709.
> > > WARNING: Logging before InitGoogleLogging() is written to STDERR
> > > I0227 18:46:06.952373 51709 exec.cpp:131] Version: 0.19.0
> > > I0227 18:46:06.956939 51742 exec.cpp:205] Executor registered on slave
> 2014-02-27-18:46:06-1740121354-59276-51387-0
> > > Registered executor on smfd-bkq-03-sr4.devel.twitter.com
> > > Starting task 9fe870d9-4b47-489f-b863-2a929ade31a8
> > > sh -c 'sleep 1000'
> > > Forked command at 51749
> > >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > [Switching to Thread 0x7fffecb03940 (LWP 51403)]
> > > 0x0000000000a09656 in
> boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<mesos::TaskID
> const, mesos::internal::slave::state::TaskState> >, mesos::TaskID,
> mesos::internal::slave::state::TaskState, boost::hash<mesos::TaskID>,
> std::equal_to<mesos::TaskID> > >::begin (this=0x7fffd803d810) at
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/unordered/detail/table.hpp:249
> > > 249             return size_ ? iterator(get_previous_start()->next_) :
> iterator();
> > > (gdb) where
> > > #0  0x0000000000a09656 in
> boost::unordered::detail::table<boost::unordered::detail::map<std::allocator<std::pair<mesos::TaskID
> const, mesos::internal::slave::state::TaskState> >, mesos::TaskID,
> mesos::internal::slave::state::TaskState, boost::hash<mesos::TaskID>,
> std::equal_to<mesos::TaskID> > >::begin (this=0x7fffd803d810) at
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/unordered/detail/table.hpp:249
> > > #1  0x00007ffff62deccb in
> boost::unordered::unordered_map<mesos::TaskID,
> mesos::internal::slave::state::TaskState, boost::hash<mesos::TaskID>,
> std::equal_to<mesos::TaskID>, std::allocator<std::pair<mesos::TaskID const,
> mesos::internal::slave::state::TaskState> > >::begin (this=0x7fffd803d810)
> at
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/unordered/unordered_map.hpp:209
> > > #2  0x00007ffff62decf7 in
> boost::range_detail::range_begin<hashmap<mesos::TaskID,
> mesos::internal::slave::state::TaskState> const> (c=...)
> > >     at
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/range/begin.hpp:49
> > > #3  0x00007ffff62ded0f in
> boost::range_adl_barrier::begin<hashmap<mesos::TaskID,
> mesos::internal::slave::state::TaskState> > (r=...) at
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/range/begin.hpp:119
> > > #4  0x00007ffff62ded37 in
> boost::foreach_detail_::begin<hashmap<mesos::TaskID,
> mesos::internal::slave::state::TaskState>, mpl_::bool_<true> > (col=...)
> > >     at
> ../3rdparty/libprocess/3rdparty/boost-1.53.0/boost/foreach.hpp:674
> > > #5  0x00007ffff62a4868 in
> mesos::internal::slave::Framework::recoverExecutor (this=0x7fffd804d6c0,
> state=...) at ../../src/slave/slave.cpp:3025
> > > #6  0x00007ffff62aa668 in
> mesos::internal::slave::Slave::recoverFramework (this=0x7fffdc01c230,
> state=...) at ../../src/slave/slave.cpp:2767
> > > #7  0x00007ffff62ab115 in mesos::internal::slave::Slave::recover
> (this=0x7fffdc01c230, _state=...) at ../../src/slave/slave.cpp:2576
> > > #8  0x00007ffff62c77c4 in std::tr1::_Mem_fn<process::Future<Nothing>
> (mesos::internal::slave::Slave::*)(Result<mesos::internal::slave::state::SlaveState>
> const&)>::operator() (this=0x7fffe0036a40,
> > >     __object=0x7fffdc01c230, __a1=...) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:214
> > > #9  0x00007ffff62cbdb1 in
> std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<Nothing>
> (mesos::internal::slave::Slave::*)(const
> Result<mesos::internal::slave::state::SlaveState>&)>
> ()(std::tr1::_Placeholder<1>,
> Result<mesos::internal::slave::state::SlaveState>)>::operator()<mesos::internal::slave::Slave*>(mesos::internal::slave::Slave
> *&) (this=0x7fffe0036a40, __u1=@0x7fffecb02d40)
> > >     at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45
> > > #10 0x00007ffff62cbdf1 in
> std::tr1::_Function_handler<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*),std::tr1::_Bind<std::tr1::_Mem_fn<process::Future<Nothing>
> (mesos::internal::slave::Slave::*)(const
> Result<mesos::internal::slave::state::SlaveState>&)>
> ()(std::tr1::_Placeholder<1>,
> Result<mesos::internal::slave::state::SlaveState>)> >::_M_invoke(const
> std::tr1::_Any_data &, mesos::internal::slave::Slave *) (__functor=...,
> __a1=0x7fffdc01c230) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:488
> > > #11 0x00007ffff62e1b4d in std::tr1::function<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*)>::operator()(mesos::internal::slave::Slave
> *) const (this=0x7fffe0036760, __a1=0x7fffdc01c230)
> > >     at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:868
> > > #12 0x00007ffff62f8a5d in process::internal::pdispatcher<Nothing,
> mesos::internal::slave::Slave>(process::ProcessBase *,
> std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*)> >,
> std::tr1::shared_ptr<process::Promise<Nothing> >) (process=0x7fffdc01c590,
> thunk=std::tr1::shared_ptr (count 2) 0x7fffecb02e50,
> promise=std::tr1::shared_ptr (count -536638560) 0x7fffecb02e30)
> > >     at ../../3rdparty/libprocess/include/process/dispatch.hpp:88
> > > #13 0x00007ffff62eedd5 in std::tr1::_Bind<void (*
> ()(std::tr1::_Placeholder<1>,
> std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*)> >,
> std::tr1::shared_ptr<process::Promise<Nothing> >))(process::ProcessBase*,
> std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*)> >,
> std::tr1::shared_ptr<process::Promise<Nothing>
> >)>::operator()<process::ProcessBase*>(process::ProcessBase *&)
> (this=0x7fffe002d340, __u1=@0x7fffecb02ea0) at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/bind_iterate.h:45
> > > #14 0x00007ffff62eee4d in std::tr1::_Function_handler<void
> ()(process::ProcessBase*),std::tr1::_Bind<void (*
> ()(std::tr1::_Placeholder<1>,
> std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*)> >,
> std::tr1::shared_ptr<process::Promise<Nothing> >))(process::ProcessBase*,
> std::tr1::shared_ptr<std::tr1::function<process::Future<Nothing>
> ()(mesos::internal::slave::Slave*)> >,
> std::tr1::shared_ptr<process::Promise<Nothing> >)> >::_M_invoke(const
> std::tr1::_Any_data &, process::ProcessBase *) (__functor=...,
> __a1=0x7fffdc01c590)
> > >     at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:502
> > > #15 0x00007ffff66bcded in std::tr1::function<void
> ()(process::ProcessBase*)>::operator()(process::ProcessBase *) const
> (this=0x7fffe0035500, __a1=0x7fffdc01c590)
> > >     at
> /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/tr1/functional_iterate.h:868
> > > #16 0x00007ffff6679593 in process::ProcessBase::visit
> (this=0x7fffdc01c590, event=...) at
> ../../../3rdparty/libprocess/src/process.cpp:3183
> > > #17 0x00007ffff6690e3e in process::DispatchEvent::visit
> (this=0x7fffe0035580, visitor=0x7fffdc01c590) at
> ../../../3rdparty/libprocess/include/process/event.hpp:136
> > > #18 0x000000000066b1fa in process::ProcessBase::serve
> (this=0x7fffdc01c590, event=...) at
> ../../3rdparty/libprocess/include/process/process.hpp:38
> > > #19 0x00007ffff66840f4 in process::ProcessManager::resume
> (this=0x10c41d0, process=0x7fffdc01c590) at
> ../../../3rdparty/libprocess/src/process.cpp:2611
> > > #20 0x00007ffff6684948 in process::schedule (arg=0x0) at
> ../../../3rdparty/libprocess/src/process.cpp:1305
> > > #21 0x00007ffff4ffe83d in start_thread () from /lib64/libpthread.so.0
> > > #22 0x00007ffff3d6526d in clone () from /lib64/libc.so.6
> >
> > Dominic Hamon wrote:
> >     If i read this right, the hashmap is returning a copy of an
> Option<T> and then we call get() on that which should be returning a const
> RunState&. There shouldn't be any need to extend any lifetime.
> >
> >     If the hashmap was returning a reference to an Option, I can see the
> problem. What am I missing?
> >
> > Ben Mahler wrote:
> >     Previously, we were binding to an rvalue RunState, as Option::get
> was returning a copy. The standard guarantees that the lifetime of this
> RunState temporary would be extended (8.5.3 of the standard).
> >
> >     Now Option::get is returning an rvalue of a RunState&, this
> reference lives inside the intermediate temporary Option in the expression.
> The lifetime of this Option _must_ be extended for the RunState reference
> to remain valid. But, I don't think the standard ensure this since the
> Option rvalue is an intermediate part of the expression. 12.2 of the
> standard is interesting here, but it's important to note that get() is
> returning a _reference_, not an object.
> >
> > Dominic Hamon wrote:
> >     the fix for the callsite is to put the Option into an explicit
> temporary as a copy, however this makes me uncomfortable as the error is a
> runtime one. Ie, it's not obvious to someone that writing it as written is
> invalid.
> >
> >     Another option is to tighten up hashmap so that get returns the
> value by reference, not an Option, and it's illegal to call 'get' when the
> key doesn't exist. Similar to C++11's map::at() method.
> >
> >     The final option is to not change the Option return to a reference.
>
> What about just making a copy in the caller? :)
>
> const RunState& run = state.runs.get(latest).get();
>
> becomes:
>
> RunState run = state.runs.get(latest).get();
> const RunState run = state.runs.get(latest).get(); // If const semantics
> are strongly desired.
>
>
right, we could do. However, this is not obvious. Most C++ devs will get
by const reference by default to avoid a copy, and this will fail at
runtime rather than compile time which makes me uncomfortable.

Also, if the object is expensive to copy/can't be copied, we're going to
run into issues with this approach.



>
> - Ben
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18386/#review35673
> -----------------------------------------------------------
>
>
> On Feb. 27, 2014, 5:12 p.m., Dominic Hamon wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/18386/
> > -----------------------------------------------------------
> >
> > (Updated Feb. 27, 2014, 5:12 p.m.)
> >
> >
> > Review request for mesos and Ben Mahler.
> >
> >
> > Bugs: MESOS-1008
> >     https://issues.apache.org/jira/browse/MESOS-1008
> >
> >
> > Repository: mesos-git
> >
> >
> > Description
> > -------
> >
> > See summary
> >
> >
> > Diffs
> > -----
> >
> >   src/linux/fs.hpp 1d86dd0d24c3daae957b5eec387638d1e8e6d7db
> >   src/linux/fs.cpp e5f4f9a16becd4e5960d0cbb7f988736188b2426
> >   src/log/log.cpp 62dc9286285d5981aa5fc63e125d3c3f51d4f457
> >   src/sched/sched.cpp dcb3158d2acd93a107fda51b19e92899a092e628
> >
> > Diff: https://reviews.apache.org/r/18386/diff/
> >
> >
> > Testing
> > -------
> >
> > make check
> >
> > 'grep' for cases where Options are reassigned after references are taken.
> >
> >
> > Thanks,
> >
> > Dominic Hamon
> >
> >
>
>


-- 
Dominic Hamon | @mrdo | Twitter
*There are no bad ideas; only good ideas that go horribly wrong.*

Reply via email to