Very interesting. Don't know if it's the same problem, but I noted an issue
quite a while ago where make -jN all/install would fail when traversing
opal. I built a workaround that was just a script that does make all in
opal, then goes back to make -jN for orte/ompi.

Perhaps this would fix that problem too....

Thanks Ralf!


On 6/3/08 3:53 PM, "Ralf Wildenhues" <ralf.wildenh...@gmx.de> wrote:

> Hi Jeff,
> 
> * Jeff Squyres wrote on Tue, Jun 03, 2008 at 11:11:32PM CEST:
>> ERROR: Command returned a non-zero exist status
>>        make -j 4 distcheck
> [...]
>> Making install in etc
>> make[3]: Entering directory
>> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_build/opal/etc'
>> make[4]: Entering directory
>> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_build/opal/etc'
>> test -z 
>> "/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_inst/etc" || /bin/mkdir -p
>> "/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_inst/etc"
>> /usr/bin/install -c -m 644 ../../../opal/etc/openmpi-mca-params.conf
>> /home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/ope
>> nmpi-1.3a1r18551/_inst/etc/openmpi-mca-params.conf
>> /usr/bin/install: cannot create regular file
>> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_inst/etc/openmpi-mca-params.conf': No such file or
>> directory
>> make[4]: *** [install-data-local] Error 1
>> make[4]: *** Waiting for unfinished jobs....
>> make[4]: *** Waiting for unfinished jobs....
>> make[4]: Leaving directory
>> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_build/opal/etc'
>> make[3]: *** [install-am] Error 2
>> make[3]: Leaving directory
>> `/home/mpiteam/openmpi/nightly-tarball-build-root/trunk/create-r18551/ompi/op
>> enmpi-1.3a1r18551/_build/opal/etc'
>> make[2]: *** [install-recursive] Error 1
> 
> Nice clue, thanks.  This is a bug in opal/etc/Makefile.am:
> 
> --- quote opal/etc/Makefile.am ---
> # This has to be here, even though it's empty, so that AM thinks that
> # "something" will happen here (details fuzzy, but we remember that this
> # *needs* to be here -- you have been warned).
> 
> sysconf_DATA = 
> 
> # Steal a little trickery from a generated Makefile to only install
> # files if they do not already exist at the target.
> 
> install-data-local:
> @ p="$(opal_config_files)"; \
> for file in $$p; do \
>  if test -f $(DESTDIR)$(sysconfdir)/$$file; then \
>    echo "******************************* WARNING
> ************************************"; \
>    echo "*** Not installing new $$file over existing file in:"; \
>    echo "***   $(DESTDIR)$(sysconfdir)/$$file"; \
>    echo "******************************* WARNING
> ************************************"; \
>  else \
>    if test -f "$$file"; then d=; else d="$(srcdir)/"; fi; \
>    f="`echo $$file | sed -e 's|^.*/||'`"; \
>    echo " $(INSTALL_DATA) $$d$$file $(DESTDIR)$(sysconfdir)/$$f"; \
>    $(INSTALL_DATA) $$d$$file $(DESTDIR)$(sysconfdir)/$$f; \
>  fi; \
> done
> --- snip ---
> 
> To clarify the mysterious comment above, the "sysconf_DATA =" line
> causes automake to emit an undocumented target install-sysconfDATA which
> effectively runs something like
>   mkdir -p $(DESTDIR)$(sysconfdir)
> 
> and then installs zero files there.  The install-data-local rule is also
> updated as a dependency of 'install', just like install-sysconfDATA,
> however there exists no dependency relation between the two.  Which
> means that with parallel make, they can be run concurrently, which I
> assume is what happened in your case; although the log shows them in the
> right order, it can still happen that mkdir wasn't done with its work
> before install-data-local accessed the directory.
> 
> An easy fix is to use install-data-hook instead, which is documented to
> run after the normal install rules; or to generate the directory in the
> install-data-local rule itself, and drop the sysconf_DATA line.
> 
> Proposed, untested patch below.
> 
> I have not checked whether there are more instances of this in OMPI.
> 
> Cheers,
> Ralf
> 
> Fix race condition in 'make install': let install-data-local
> create $(sysconfdir), rather than an automake-generated rule
> which may be run in parallel (with make -j).
> 
> Index: opal/etc/Makefile.am
> ===================================================================
> --- opal/etc/Makefile.am (Revision 17766)
> +++ opal/etc/Makefile.am (Arbeitskopie)
> @@ -23,16 +23,11 @@
>  
>  EXTRA_DIST = $(opal_config_files)
>  
> -# This has to be here, even though it's empty, so that AM thinks that
> -# "something" will happen here (details fuzzy, but we remember that this
> -# *needs* to be here -- you have been warned).
> -
> -sysconf_DATA = 
> -
>  # Steal a little trickery from a generated Makefile to only install
>  # files if they do not already exist at the target.
>  
>  install-data-local:
> + $(mkdir_p) $(DESTDIR)$(sysconfdir)
> @ p="$(opal_config_files)"; \
> for file in $$p; do \
>  if test -f $(DESTDIR)$(sysconfdir)/$$file; then \
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to