[ 
https://issues.apache.org/jira/browse/MESOS-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263273#comment-13263273
 ] 

[email protected] commented on MESOS-110:
-----------------------------------------------------



bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > I've only gotten halfway through ... but there is a bunch here already. 
I'd like to break this up into at least four patches. (1) The utils stuff that 
was added. (2) The master changes. (3) The slave::path namespace stuff. (3) The 
status update manager API + implementation (but not the slave using it yet). 
And (4) the slave using each of these components, and the executor changes that 
are included.
bq.  > 
bq.  > These comments are across all of those patches, but I'll make future 
passes on each of those components.

addressed the comments for the utils.hpp part. Will send a review for utils.hpp 
and protobuf_utils.hpp (forgot to include it in this review) shortly.


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 211
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line211>
bq.  >
bq.  >     Add that this is at the current file position of the file descriptor.

done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 218
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line218>
bq.  >
bq.  >     How is this helpful? (If this came from my code, it should be 
removed.)

it did come from your code. killed it.


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 259
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line259>
bq.  >
bq.  >     Blah.

fixed format.


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 253
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line253>
bq.  >
bq.  >     I'd prefer if this did not seek to the beginning and read the file, 
but rather read from the current position until the end (and have the comment 
say as much).

line 263 was a bug. fixed it. this line should make sense now!


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 264
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line264>
bq.  >
bq.  >     This looks like a bug ('offset' as the third argument?).

good catch! fixed.


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 267
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line267>
bq.  >
bq.  >     No need for the space here though.

fixed


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 276
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line276>
bq.  >
bq.  >     Again, should be killed (only makes sense in a macro).

killed


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 290
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line290>
bq.  >
bq.  >     You should refactor the protobuf::read and protobuf::write to use 
these versions of read and write now as well.

didnt get u?


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 329
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line329>
bq.  >
bq.  >     man 3 dirname (and use it please).

aha..done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 532
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line532>
bq.  >
bq.  >     s/file_pattern/pattern

done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 534
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line534>
bq.  >
bq.  >     s/result/results

done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 536
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line536>
bq.  >
bq.  >     Kill.

done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 538
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line538>
bq.  >
bq.  >     Why not return a Try instead?

done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 546
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line546>
bq.  >
bq.  >     Why is this a hack?

because we are using exists() function to check isDir() semantics, based on the 
knowledge that the entry always exists.


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 549
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line549>
bq.  >
bq.  >     s/p/result or s/p/path

done


bq.  On 2012-04-25 22:11:01, Benjamin Hindman wrote:
bq.  > src/common/utils.hpp, line 555
bq.  > <https://reviews.apache.org/r/4462/diff/3/?file=103024#file103024line555>
bq.  >
bq.  >     Kill.

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4462/#review7232
-----------------------------------------------------------


On 2012-04-19 16:53:07, Vinod Kone wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4462/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-04-19 16:53:07)
bq.  
bq.  
bq.  Review request for mesos, Benjamin Hindman and John Sirois.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Sorry for the huge  CL!
bq.  
bq.  Slave restarts now supports recovery!
bq.  --> Non-disruptive restart means running tasks are not lost
bq.  --> Re-connects with live executors
bq.  --> Checkpoints and reliably sends status updates
bq.  --> Ability to kill executors if the slave upgrade is incompatible with 
running executors
bq.  
bq.  
bq.  This addresses bug mesos-110.
bq.      https://issues.apache.org/jira/browse/mesos-110
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/Makefile.am d5edaa2 
bq.    src/common/hashset.hpp 1feb610 
bq.    src/common/utils.hpp 1d81e21 
bq.    src/exec/exec.cpp e8db407 
bq.    src/launcher/launcher.cpp a141b9a 
bq.    src/local/local.hpp 55f9eaf 
bq.    src/local/local.cpp affe432 
bq.    src/master/master.cpp 4dc9ee0 
bq.    src/messages/messages.proto 87e1548 
bq.    src/sched/sched.cpp dcadb10 
bq.    src/scripts/killtree.sh bceae9d 
bq.    src/slave/constants.hpp f0c8679 
bq.    src/slave/http.cpp 19c48a0 
bq.    src/slave/isolation_module.hpp c896908 
bq.    src/slave/lxc_isolation_module.hpp b7beefe 
bq.    src/slave/lxc_isolation_module.cpp 66a2a89 
bq.    src/slave/main.cpp 85cba25 
bq.    src/slave/process_based_isolation_module.hpp f6f9554 
bq.    src/slave/process_based_isolation_module.cpp 2b37d42 
bq.    src/slave/slave.hpp 279bc7b 
bq.    src/slave/slave.cpp 3358ec4 
bq.    src/slave/statusupdates_manager.hpp PRE-CREATION 
bq.    src/slave/statusupdates_manager.cpp PRE-CREATION 
bq.    src/tests/external_tests.cpp d1b20e4 
bq.    src/tests/fault_tolerance_tests.cpp 6772daf 
bq.    src/tests/slave_restart_tests.cpp PRE-CREATION 
bq.    src/tests/utils.hpp e81ec82 
bq.  
bq.  Diff: https://reviews.apache.org/r/4462/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  make check.
bq.  
bq.  Note that only the new test in tests/slave_restart_tests.cpp  engages in 
recovery!
bq.  
bq.  Recovery is disabled for old tests (though they still checkpoint relevant 
info!)
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Vinod
bq.  
bq.


                
> Mesos deploys should not restart tasks
> --------------------------------------
>
>                 Key: MESOS-110
>                 URL: https://issues.apache.org/jira/browse/MESOS-110
>             Project: Mesos
>          Issue Type: Improvement
>          Components: framework
>            Reporter: Rob Benson
>            Assignee: Vinod Kone
>
> Running a long-lived service on Mesos has a significant drawback right now in 
> that Mesos build deploys restart your tasks. This could lead to nontrivial 
> outages for services that have a high warm-up time.  Basically everything 
> would need a graceful restart mechanism that basically allows a 
> shutdown/restart with a new version of the code. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to