Yes - but the processes must stay in the same location

> On Apr 13, 2015, at 12:02 PM, Federico Reghenzani 
> <federico1.reghenz...@mail.polimi.it> wrote:
> 
> Thank you.
> 
> And, to workaround, is it possible to temporary suspend processes on a node 
> and later resume it (requested by RM)?  I saw in the code that orted can 
> receive SIGTSTP and SIGCONT to suspend/resume processes.
> 
> 
> Cheers,
> Federico Reghenzani
> 
> 2015-04-10 16:58 GMT+02:00 Ralph Castain <r...@open-mpi.org 
> <mailto:r...@open-mpi.org>>:
> I’m afraid not. The MPI job would not be very happy to suddenly lose some 
> nodes during execution, and relocating MPI processes during execution is 
> something we don’t currently support.
> 
> There is work underway to integrate the RM more fully into that procedure so 
> it could tell the MPI job to checkpoint, wait until that completed, terminate 
> the job, and then fast-restart it on the new nodes - but that isn’t here yet.
> 
> 
>> On Apr 10, 2015, at 7:54 AM, Federico Reghenzani 
>> <federico1.reghenz...@mail.polimi.it 
>> <mailto:federico1.reghenz...@mail.polimi.it>> wrote:
>> 
>> The RM can ask for deallocation of some nodes?
>> 
>> For example, mpirun asks to the RM which resources are available (let node1, 
>> node2, node3) and spawns orted in the nodes. After some time during the 
>> elaboration, can the RM ask to deassign node3 or  reassign jobs on node3 to 
>> node4?
>> 
>> Cheers,
>> Federico Reghenzani
>> 
>> 2015-03-26 18:09:22 GMT+06:00 Artem Polyakov <artpol84_at_[hidden]>:  
>> 
>> P.S. also check ESS (orte/mca/ess) for environment setup. 
>> 2015-03-26 18:06 GMT+06:00 Artem Polyakov <artpol84_at_[hidden]>: 
>> > 
>> > 2015-03-26 17:58 GMT+06:00 Gianmario Pozzi <pozzigmario_at_[hidden]>: 
>> > 
>> >> Hi everyone, 
>> >> I'm an italian M.Sc. student in Computer Engineering at Politecnico di 
>> >> Milano. 
>> >> 
>> >> My team and I are trying to integrate OpenMPI with a real time resource 
>> >> manager written by a group of students named BBQ ( 
>> >> http://bosp.dei.polimi.it/ <http://bosp.dei.polimi.it/> ). We are 
>> >> encountering some troubles, though. 
>> >> 
>> >> Our main issue is to understand how ORTE interacts with the resource 
>> >> manager, which parts of the code (if any) are executed on the "slave" 
>> >> nodes 
>> >> and which ones on the "master". 
>> >> We spent some time looking at the source code but we still have many 
>> >> doubts. 
>> >> 
>> > 
>> > Hello, 
>> > check orte/mca/plm and orte/mca/ras 
>> > PLM - process lifecycle manager 
>> > RAS - resource allocation subsystem. 
>> > 
>> > In RAS mpirun detects under which RM it works and gets the allocation. 
>> > in PLM spawn of remote processes is done. 
>> > mpirun spawns orted daemons on the slave nodes and all the rest is done 
>> > without RM intervention (IMHO). 
>> > 
>> > 
>> >> 
>> >> Thank you. 
>> >> 
>> >> _______________________________________________ 
>> >> devel mailing list 
>> >> devel_at_[hidden] 
>> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> >> <http://www.open-mpi.org/mailman/listinfo.cgi/devel> 
>> >> Link to this post: 
>> >> http://www.open-mpi.org/community/lists/devel/2015/03/17157.php 
>> >> <http://www.open-mpi.org/community/lists/devel/2015/03/17157.php> 
>> >> 
>> > 
>> > 
>> > 
>> > -- 
>> > С Уважением, ÐŸÐ¾Ð»Ñ ÐºÐ¾Ð² Рртем Юрьевич 
>> > Best regards, Artem Y. Polyakov 
>> > 
>> 
>> --
>> С Уважением, ÐŸÐ¾Ð»Ñ ÐºÐ¾Ð² Рртем Юрьевич
>> Best regards, Artem Y. Polyakov
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org <mailto:de...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
>> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/devel/2015/04/17210.php 
>> <http://www.open-mpi.org/community/lists/devel/2015/04/17210.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org <mailto:de...@open-mpi.org>
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel 
> <http://www.open-mpi.org/mailman/listinfo.cgi/devel>
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17211.php 
> <http://www.open-mpi.org/community/lists/devel/2015/04/17211.php>
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2015/04/17215.php

Reply via email to