On Nov 13, 2006, at 10:27 AM, Ethan Mallove wrote:

I can infer that you have an MPI Install section labeled
"odin 64 bit gcc". A few questions:

* What is the mpi_get for that section (or how does that
  parameter get filled in by your automated scripts)?

I attached the generated INI file for you to look at.

Attachment: nightly-trunk-64-gcc.ini-gen
Description: Binary data


It is the same value for all parallel runs of GCC+64bit (same value for all branches)


* Do you start with a fresh scratch tree every run?

Yep. Every run, and all of the parallel runs.

* Could you email me your scratch/installs/mpi_installs.xml
  files?

<mpi_installs>
  <mpi_get simple_section_name="ompi-nightly-trunk">
    <mpi_version version="1.3a1r12559">
      <mpi_install simple_section_name="odin 64 bit gcc"
                   append_path=""
                   bindir="/san/homedirs/mpiteam/mtt-runs/odin/20061112-Nightly/parallel-block-1/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/install/bin"
                   c_bindings="1"
                   compiler_name="gnu"
                   compiler_version="3.4.6"
                   configure_arguments="FCFLAGS=-m64 FFLAGS=-m64 CFLAGS=-m64 CXXFLAGS=-m64 --with-wrapper-cflags=-m64 --with-wrapper-cxxflags=-m64 --with-wrapper-fflags=-m64 --with-wrapper-fcflags=-m64"
                   cxx_bindings="1"
                   f77_bindings="1"
                   f90_bindings="1"
                   full_section_name="mpi install: odin 64 bit gcc"
                   installdir="/san/homedirs/mpiteam/mtt-runs/odin/20061112-Nightly/parallel-block-1/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/install"
                   libdir="/san/homedirs/mpiteam/mtt-runs/odin/20061112-Nightly/parallel-block-1/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/install/lib"
                   merge_stdout_stderr="1"
                   mpi_details="Open MPI"
                   mpi_get_full_section_name="mpi get: ompi-nightly-trunk"
                   mpi_get_simple_section_name="ompi-nightly-trunk"
                   mpi_version="1.3a1r12559"
                   prepend_path=""
                   result_message="Success"
                   setenv=""
                   success="1"
                   test_status="installed"
                   timestamp="1163316821"
                   unsetenv=""
                   vpath_mode="none" />
    </mpi_version>
  </mpi_get>
</mpi_installs>
The attached mpi_installs.xml is from the trunk+gcc+64bit parallel scratch directory.


I checked on how widespread this issue is, and found that
18,700 out of 474,000 Test Run rows in the past month have a
mpi_version/command (v1.2-trunk) mismatch. Occuring in both
directions (version=1.2, command=trunk and vice versa).
They occur on these clusters:

 Cisco MPI development cluster
 IU Odin
 IU - Thor - TESTING


Interesting...

There *is* that race condition in which one mtt submitting
could overwrite another's index. Do you have "trunk" and
"1.2" runs submitting to the database at the same time?

Yes we do. :(

The parallel blocks as we call them are separate scratch directories in which MTT is running concurrently. Meaning that we have N parallel block scratch directories each running one instance of MTT. So it is possible (and highly likely) that when the reporter phase fires all of the N parallel blocks are firing it about the same time.

Without knowing how the reporter is doing the inserts into the database I don't think I can help much more than that on debugging. When the reporter fires for the DB: - Does it start a transaction for the connection, do the inserts, then commit? - Does it ship the inserts to the server then allow it to run them, or does the client do all of the individual inserts?

-- Josh



On Sun, Nov/12/2006 06:04:17PM, Jeff Squyres (jsquyres) wrote:

   I feel somewhat better now.  Ethan - can you fix?
    -----Original Message-----
   From:   Tim Mattox [[1]mailto:timat...@open-mpi.org]
   Sent:   Sunday, November 12, 2006 05:34 PM Eastern Standard Time
   To:     General user list for the MPI Testing Tool
Subject: [MTT users] Corrupted MTT database or incorrucet query
   Hello,
   I just noticed that the MTT summary page is presenting
   incorrect information for our recent runs at IU.  It is
   showing failures for the 1.2b1 that actaully came from
   the trunk!  See the first entry in this table:
http://www.open-mpi.org/mtt/reporter.php? &maf_start_test_timestamp=200 6-11-12%2019:12:02%20through%202006-11-12% 2022:12:02&ft_platform_id=co ntains&tf_platform_id=IU&maf_phase=runs&maf_success=fail&by_atom=*by_ t est_case&go=Table&maf_agg_timestamp=- &mef_mpi_name=All&mef_mpi_version =All&mef_os_name=All&mef_os_version=All&mef_platform_hardware=All&mef _ platform_id=All&agg_platform_id=off&1- page=off&no_bookmarks&no_bookmar
   ks
   Click on the [i] in the upper right (the first entry)
   to get the popup window which shows the MPIRrun cmd as:
   mpirun -mca btl tcp,sm,self -np 6 --prefix
/san/homedirs/mpiteam/mtt-runs/odin/20061112-Testing-NOCLN/ parallel-bl ock-3/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/ install
   dynamic/spawn Note the path has "1.3a1r12559" in the
   name... it's a run from the trunk, yet the table showed
   this as a 1.2b1 run.  There are several of these
   missattributed errors.  This would explain why Jeff saw
   some ddt errors on the 1.2 brach yesterday, but was
   unable to reproduce them.  They were from the trunk!
   --
   Tim Mattox - [2]http://homepage.mac.com/tmattox/
    tmat...@gmail.com || timat...@open-mpi.org
       I'm a bright... [3]http://www.the-brights.net/
   _______________________________________________
   mtt-users mailing list
   mtt-us...@open-mpi.org
   [4]http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

References

   1. mailto:timat...@open-mpi.org
   2. http://homepage.mac.com/tmattox/
   3. http://www.the-brights.net/
   4. http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

_______________________________________________
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users


--
-Ethan
_______________________________________________
mtt-users mailing list
mtt-us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users

----
Josh Hursey
jjhur...@open-mpi.org
http://www.open-mpi.org/

Reply via email to