> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/messages/messages.proto, line 65 > > <https://reviews.apache.org/r/7212/diff/1/?file=159109#file159109line65> > > > > Move type closer to the enum, and make it '= 1' (it's the thing that > > differentiates the type of message you have).
done > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.hpp, line 19 > > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line19> > > > > Why not match the filename? s/STATUSUPDATES/STATUS_UPDATES/ done > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.hpp, line 58 > > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line58> > > > > Use an Option (and default it to none) if you're going to optionally > > send a message based on executorPID == UPID(). done > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 39 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line39> > > > > If you used the slave's PID rather than self (line 243) then the slave > > could dispatch to statusUpdateAcknowledgement which would give the added > > benefit of pushing any errors in recording the status update > > acknowledgement back to the slave to take action. It's more work on the > > slave side, but it enables capturing those errors fairly cleanly. I'm happy > > to discuss this more in detail. agreed. i also like the added benefit of symmetry. passing slave pid via initialize(), so that we can inject different kind of SUMs (mock/no disk checkpointer) into the slave. > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 83 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line83> > > > > If the slave removed the executor from it's data structures, and the > > slave dispatches to the SUM, how is this case possible? This sounds like it > > should be more of a CHECK than a LOG. I dont' remember the scenario that needed me to have this log instead of check. Reverting this to check for now. I will update it later if the scenario is caught in tests. > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 102 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line102> > > > > We might need to rethink the 'acknowledged' set, since it grows > > (without bound) with the number of tasks. Discussed offline. The acknowledged set doesnt grow unbounded. > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 147 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line147> > > > > Is this just your guess about how this will work? Or is there code that > > actually uses this? If so, I'd love to see that review too please. This is how it is done in the slave. I was in 2 minds regarding where this recovery should happen. I went with all the recovery logic being inside the slave. Open to discussion. I will send out that review (integration with he slave) next. > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 136 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line136> > > > > How about having sendStatusUpdate return the timeout (since delay > > returns a Timer from which you can call Timer::timeout)? done > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 187 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line187> > > > > s/no associated/no associated/ hawk eye! > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 206 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line206> > > > > Just to be pedantic, you need to do this AFTER you write it to disk. :) > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 248 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line248> > > > > return delay(...).timeout(); done > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 282 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line282> > > > > The update can have a large 'data' chunk. Any reason not to just write > > the UUID so that we don't have to write the data chunk twice? good point. fixed > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 318 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line318> > > > > Better yet, do a get which should return an Option. even better, just killed this function in favor of inlining it :) > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 344 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line344> > > > > Again, Option seems best here. And delete after you remove from the > > data structure please. ;) How would that avoid double look up here? > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.cpp, line 123 > > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line123> > > > > You could return the error through the future and let the slave commit > > suicide. agreed. done > On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote: > > src/slave/status_updates_manager.hpp, line 47 > > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line47> > > > > There needs to be a high-level comment here describing the semantics of > > receiving (from an executor, when is the executor ACKed?), recording (what > > about frameworks that don't want the penalty of recording?), and sending > > status updates (how long are they resent?). Likewise, describing a few of > > the data structures here such as what a "status update stream" is, how we > > "name" it, how long it sticks around, etc. done. Let me know how it reads. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7212/#review12429 ----------------------------------------------------------- On Sept. 21, 2012, 6:20 p.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/7212/ > ----------------------------------------------------------- > > (Updated Sept. 21, 2012, 6:20 p.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Description > ------- > > This is Status Update Manager's API and implementation. > > Haven't included the integration (into slave) and tests yet. > > Rebased off latest trunk > > > Diffs > ----- > > src/Makefile.am df8696920fd48968907270decbef3dda0803f80a > src/messages/messages.proto 2a0321cb3d9e3f6499e2108a3b21516a3bd18d2f > src/slave/status_updates_manager.hpp PRE-CREATION > src/slave/status_updates_manager.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/7212/diff/ > > > Testing > ------- > > make check. > > > Thanks, > > Vinod Kone > >
