Okay, the irony here is truly humorous. This took several hours to chase
down.

As you may recall, we had an earlier problem with the unity routed module
where I gave you a couple of options for repairing it. Well, it turned out
that the latest changes obviated the need for that hack...and so the hack
caused the system to fail.

So, having now removed the prior hack required to keep the module alive, you
should find it happy again!

BTW: it isn't that the unity module is such a pain in itself. The problem
lies in our efforts to shift data movement to the daemon level for
scalability, versus the inherent "everything happens directly between the
apps" approach of the unity module. As we move more and more things to the
daemon level, we are achieving the scalability we want - it just makes it
harder to find a way to blend the conflicting approach in unity so it can
keep running.

I believe we have now reached a point, though, where it may now be easier to
keep that module alive. Everything we need to shift to the daemons has now
been shifted, so I don't believe unity is going to present as much of a
problem going forward.

I still think it would be good for you to get C/R to work with non-unity
routed modules for scalability reasons - unity is still inherently
non-scalable. But hopefully it won't be as much of a roller-coaster for you
as we go forward.

Thanks for the patience
Ralph


On 4/9/08 5:15 PM, "Ralph Castain" <r...@lanl.gov> wrote:

> Groan...yes, will look at it this evening and get it fixed as quickly as I
> can.
> 
> Sorry...like I said, unity is getting harder and harder to keep alive. :-/
> 
> Ralph
> 
> 
> On 4/9/08 5:01 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote:
> 
>> Ralph,
>> 
>> It seems that the 'unity' component of the routed framework is broken
>> as a result of this commit. :(
>> 
>> Any chance you can take a look at this?
>> 
>> Thanks,
>> Josh
>> 
>> On Apr 9, 2008, at 6:10 PM, r...@osl.iu.edu wrote:
>>> Author: rhc
>>> Date: 2008-04-09 18:10:53 EDT (Wed, 09 Apr 2008)
>>> New Revision: 18115
>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/18115
>>> 
>>> Log:
>>> Fully implement the inbound binomial allgather for daemon-based
>>> collectives. Supports both modex and barrier operations.
>>> 
>>> Comm_spawn still uses the rank=0 method - shifting that algo to the
>>> daemons is under study.
>>> 
>>> 
>>> Removed:
>>>   trunk/orte/mca/grpcomm/base/grpcomm_base_barrier.c
>>>   trunk/orte/mca/grpcomm/exp/
>>> Text files modified:
>>>   trunk/ompi/mca/pml/ob1/pml_ob1.c                             |     1
>>>   trunk/orte/mca/ess/hnp/ess_hnp_module.c                      |     2
>>>   trunk/orte/mca/grpcomm/base/Makefile.am                      |     1
>>>   trunk/orte/mca/grpcomm/base/base.h                           |     3
>>>   trunk/orte/mca/grpcomm/base/grpcomm_base_allgather.c         |
>>> 253 -----------
>>>   trunk/orte/mca/grpcomm/basic/grpcomm_basic_component.c       |     4
>>>   trunk/orte/mca/grpcomm/basic/grpcomm_basic_module.c          |
>>> 832 ++++++++++++++++++++++++++++++++++-----
>>>   trunk/orte/mca/grpcomm/cnos/grpcomm_cnos_module.c            |     8
>>>   trunk/orte/mca/grpcomm/grpcomm.h                             |
>>> 27 +
>>>   trunk/orte/mca/grpcomm/grpcomm_types.h                       |     8
>>>   trunk/orte/mca/odls/base/odls_base_close.c                   |     1
>>>   trunk/orte/mca/odls/base/odls_base_default_fns.c             |
>>> 131 ++++-
>>>   trunk/orte/mca/odls/base/odls_base_open.c                    |
>>> 24 +
>>>   trunk/orte/mca/odls/base/odls_private.h                      |    16
>>>   trunk/orte/mca/plm/base/plm_base_launch_support.c            |     7
>>>   trunk/orte/mca/rmaps/base/rmaps_base_map_job.c               |     1
>>>   trunk/orte/mca/rmaps/base/rmaps_base_open.c                  |     4
>>>   trunk/orte/mca/rmaps/base/rmaps_base_support_fns.c           |
>>> 186 +-------
>>>   trunk/orte/mca/rmaps/base/rmaps_private.h                    |     2
>>>   trunk/orte/mca/rmaps/rank_file/rmaps_rank_file.c             |     2
>>>   trunk/orte/mca/rmaps/rmaps_types.h                           |
>>> 28 +
>>>   trunk/orte/mca/rmaps/round_robin/rmaps_rr.c                  |     8
>>>   trunk/orte/mca/rmaps/seq/rmaps_seq.c                         |     2
>>>   trunk/orte/mca/rml/rml_types.h                               |    36
>>>   trunk/orte/orted/orted_comm.c                                |
>>> 43 +-
>>>   trunk/orte/runtime/data_type_support/orte_dt_copy_fns.c      |     2
>>>   trunk/orte/runtime/data_type_support/orte_dt_packing_fns.c   |     4
>>>   trunk/orte/runtime/data_type_support/orte_dt_print_fns.c     |     4
>>>   trunk/orte/runtime/data_type_support/orte_dt_unpacking_fns.c |     4
>>>   trunk/orte/runtime/orte_globals.c                            |     3
>>>   trunk/orte/runtime/orte_globals.h                            |     1
>>>   trunk/orte/runtime/orte_globals_class_instances.h            |     2
>>>   32 files changed, 1019 insertions(+), 631 deletions(-)
>>> 
>>> 
>>> Diff not shown due to size (106446 bytes).
>>> To see the diff, run the following command:
>>> 
>>> svn diff -r 18114:18115 --no-diff-deleted
>>> 
>>> _______________________________________________
>>> svn mailing list
>>> s...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn
>> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to