On Nov 13, 2006, at 10:27 AM, Ethan Mallove wrote:
I can infer that you have an MPI Install section labeled "odin 64 bit gcc". A few questions: * What is the mpi_get for that section (or how does that parameter get filled in by your automated scripts)?
I attached the generated INI file for you to look at.
nightly-trunk-64-gcc.ini-gen
Description: Binary data
It is the same value for all parallel runs of GCC+64bit (same value for all branches)
* Do you start with a fresh scratch tree every run?
Yep. Every run, and all of the parallel runs.
* Could you email me your scratch/installs/mpi_installs.xml files?
<mpi_installs> <mpi_get simple_section_name="ompi-nightly-trunk"> <mpi_version version="1.3a1r12559"> <mpi_install simple_section_name="odin 64 bit gcc" append_path="" bindir="/san/homedirs/mpiteam/mtt-runs/odin/20061112-Nightly/parallel-block-1/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/install/bin" c_bindings="1" compiler_name="gnu" compiler_version="3.4.6" configure_arguments="FCFLAGS=-m64 FFLAGS=-m64 CFLAGS=-m64 CXXFLAGS=-m64 --with-wrapper-cflags=-m64 --with-wrapper-cxxflags=-m64 --with-wrapper-fflags=-m64 --with-wrapper-fcflags=-m64" cxx_bindings="1" f77_bindings="1" f90_bindings="1" full_section_name="mpi install: odin 64 bit gcc" installdir="/san/homedirs/mpiteam/mtt-runs/odin/20061112-Nightly/parallel-block-1/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/install" libdir="/san/homedirs/mpiteam/mtt-runs/odin/20061112-Nightly/parallel-block-1/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/install/lib" merge_stdout_stderr="1" mpi_details="Open MPI" mpi_get_full_section_name="mpi get: ompi-nightly-trunk" mpi_get_simple_section_name="ompi-nightly-trunk" mpi_version="1.3a1r12559" prepend_path="" result_message="Success" setenv="" success="1" test_status="installed" timestamp="1163316821" unsetenv="" vpath_mode="none" /> </mpi_version> </mpi_get> </mpi_installs>The attached mpi_installs.xml is from the trunk+gcc+64bit parallel scratch directory.
I checked on how widespread this issue is, and found that 18,700 out of 474,000 Test Run rows in the past month have a mpi_version/command (v1.2-trunk) mismatch. Occuring in both directions (version=1.2, command=trunk and vice versa). They occur on these clusters: Cisco MPI development cluster IU Odin IU - Thor - TESTING
Interesting...
There *is* that race condition in which one mtt submitting could overwrite another's index. Do you have "trunk" and "1.2" runs submitting to the database at the same time?
Yes we do. :(The parallel blocks as we call them are separate scratch directories in which MTT is running concurrently. Meaning that we have N parallel block scratch directories each running one instance of MTT. So it is possible (and highly likely) that when the reporter phase fires all of the N parallel blocks are firing it about the same time.
Without knowing how the reporter is doing the inserts into the database I don't think I can help much more than that on debugging. When the reporter fires for the DB: - Does it start a transaction for the connection, do the inserts, then commit? - Does it ship the inserts to the server then allow it to run them, or does the client do all of the individual inserts?
-- Josh
On Sun, Nov/12/2006 06:04:17PM, Jeff Squyres (jsquyres) wrote:I feel somewhat better now. Ethan - can you fix? -----Original Message----- From: Tim Mattox [[1]mailto:timat...@open-mpi.org] Sent: Sunday, November 12, 2006 05:34 PM Eastern Standard Time To: General user list for the MPI Testing ToolSubject: [MTT users] Corrupted MTT database or incorrucet queryHello, I just noticed that the MTT summary page is presenting incorrect information for our recent runs at IU. It is showing failures for the 1.2b1 that actaully came from the trunk! See the first entry in this table:http://www.open-mpi.org/mtt/reporter.php? &maf_start_test_timestamp=200 6-11-12%2019:12:02%20through%202006-11-12% 2022:12:02&ft_platform_id=co ntains&tf_platform_id=IU&maf_phase=runs&maf_success=fail&by_atom=*by_ t est_case&go=Table&maf_agg_timestamp=- &mef_mpi_name=All&mef_mpi_version =All&mef_os_name=All&mef_os_version=All&mef_platform_hardware=All&mef _ platform_id=All&agg_platform_id=off&1- page=off&no_bookmarks&no_bookmarks Click on the [i] in the upper right (the first entry) to get the popup window which shows the MPIRrun cmd as: mpirun -mca btl tcp,sm,self -np 6 --prefix/san/homedirs/mpiteam/mtt-runs/odin/20061112-Testing-NOCLN/ parallel-bl ock-3/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/ installdynamic/spawn Note the path has "1.3a1r12559" in the name... it's a run from the trunk, yet the table showed this as a 1.2b1 run. There are several of these missattributed errors. This would explain why Jeff saw some ddt errors on the 1.2 brach yesterday, but was unable to reproduce them. They were from the trunk! -- Tim Mattox - [2]http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... [3]http://www.the-brights.net/ _______________________________________________ mtt-users mailing list mtt-us...@open-mpi.org [4]http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users References 1. mailto:timat...@open-mpi.org 2. http://homepage.mac.com/tmattox/ 3. http://www.the-brights.net/ 4. http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users_______________________________________________ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users-- -Ethan _______________________________________________ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
---- Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/