I, too, have tried various builds of the rc4 release. It's dying
during orterun.
Specifically, here's the call chain where things fall apart:
orterun -> orte_init -> opal_init -> opal_carto_base_select ->
mca_base_select
54 for (item = opal_list_get_first(components_available);
55 item != opal_list_get_end(components_available);
56 item = opal_list_get_next(item) ) {
57 cli = (mca_base_component_list_item_t *) item;
58 component = (mca_base_component_t *) cli->cli_component;
The code is failing on line #55, i.e. item must be getting set to the
end on the first pass through. The code then jumps to line #107 and
passes the NULL test there:
107 if (NULL == *best_component) {
108 opal_output_verbose(5, output_id,
109 "mca:base:select:(%5s) No component
selected!",
110 type_name);
111 /*
112 * Still close the non-selected components
113 */
114 mca_base_components_close(0, /* Pass 0 to keep this from
closing the output handle */
115 components_available,
116 NULL);
117 return OPAL_ERR_NOT_FOUND;
118 }
-david
--
David Gunter
HPC-3: Infrastructure Team
Los Alamos National Laboratory
Sam Gutierrez wrote:
> Hi All,
> I just built OMPI 1.3.4rc4 on one of our Roadrunner machines. When I
> try to launch a simple MPI job, I get the following:
> [rra011a.rr.lanl.gov:31601] mca: base: components_open: Looking for
> carto components
> [rra011a.rr.lanl.gov:31601] mca: base: components_open: opening
carto
> components
> [rra011a.rr.lanl.gov:31601] mca:base:select: Auto-selecting carto
> components
> [rra011a.rr.lanl.gov:31601] mca:base:select:(carto) No component
> selected!
>
--------------------------------------------------------------------------
> It looks like opal_init failed for some reason; your parallel
process is
> likely to abort. There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal
failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
> opal_carto_base_select failed
> --> Returned value -13 instead of OPAL_SUCCESS
>
--------------------------------------------------------------------------
> [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
> found in file runtime/orte_init.c at line 77
> [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
> found in file orterun.c at line 541
> This may be an issue on our end regarding a runtime parameter that
> isn't set correctly. See attached. Please let me know if you need
> any more info.
> Thanks!
> --
Samuel K. Gutierrez
Los Alamos National Laboratory
On Nov 4, 2009, at 3:00 PM, Jeff Squyres wrote:
> The latest-n-greatest is available here:
>
> http://www.open-mpi.org/software/ompi/v1.3/
>
> Please beat it up and look for problems!
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel