Re: [hpx-users] Generate OTF2 traces of distributed runs with APEX, re-revisited

Kevin Huck Fri, 17 Sep 2021 07:15:19 -0700

Kor -

Well, those messages may still be legitimate - they could be related to clock 
synchronization (a receive happening “before” a send), and there appear to be 
some un-matched messages either during initialization or finalization (I 
assume).  Unfortunately, Vampir doesn’t give any more information, and 
otf2-print doesn’t report that anything is wrong with the trace.  I recently 
added clock synchronization for MPI applications, I need to do the same for HPX 
applications.


The range violation happens when there is one event after the reported “end” 
timestamp of the trace.  I take a timestamp during the “shutdown” step of APEX, 
in case the post-processing takes a long time (which can happen with 
asynchronous CUDA/HIP activity processing) I don’t want it to dilate the total 
trace time.  At any rate, there appears to be a race condition in shutdown that 
allows events to happen after I have taken that last timestamp.

I’m glad you find the trace output useful!  The APEX + HPX integration has been 
a long-running collaboration between LSU and UO.

Thanks -
Kevin

> On Sep 16, 2021, at 10:53 PM, Kor de Jong <[email protected]> wrote:
> 
> Hi Kevin,
> 
> Thank you for your explanation. I now better understand how I should read the 
> Vampir trace.
> 
> You say "I assume 1 process per physical node". My trace involved 8 processes 
> on a single node. Maybe that explains the messages Vampir throws at me:
> 
> >> Event matching irregular - 79624 in total
> >> Pending messages - 120 in total
> >> Range violation - 1 in total
> 
> I will try again with using a single process per node -- using multiple nodes.
> 
> BTW, being able to trace tasks this way is very useful! I don't know who is / 
> are responsible for the APEX + HPX integration, but I think it is great.
> 
> Best regards,
> Kor
> 
> 
> On 9/17/21 12:09 AM, Kevin Huck wrote:
>> Kor -
>> Sorry I didn’t reply sooner… I’m glad things are working for you now!
>> The thread naming is a bit odd, because Vampir changed how they display the 
>> names (from version 8 to version 9, I think), and I haven’t really tried 
>> that hard to make sure that the names accurately reflect the physical 
>> hardware.  But the process/thread hierarchy is correct, even if the naming 
>> looks odd.  For example:
>> CPU thread 1:1 - this is the main thread of the program, although HPX 
>> doesn’t use it to execute tasks.  APEX attributes all communication to this 
>> thread.
>> CPU thread 2:1 - this is the first worker thread spawned by HPX.
>> CPU thread 4:1 - this is the second worker thread spawned by HPX.  APEX has 
>> numbered it oddly, because thread 3 is internal to APEX.
>> CPU thread 5:1 - etc.
>> CPU thread 6:1
>> CPU thread 7:1
>> CPU thread 8:1
>> I hope that explains things.  I don’t use “hwloc” or any library like that 
>> to construct a perfectly accurate system hardware hierarchy, because it 
>> hasn’t been worth the effort. For the tracing, I assume 1 process per 
>> physical node, and as long as the OS processes and threads are annotated, it 
>> works.  Just wait until you see how I annotate the GPU threads… 🙃 (it’s 
>> actually not that bad)
>> Thanks -
>> Kevin
>>> On Sep 6, 2021, at 10:55 AM, Kor de Jong <[email protected] 
>>> <mailto:[email protected]><mailto:[email protected] <mailto:[email protected]>>> 
>>> wrote:
>>> 
>>> [I sent the message below to the HPX mailinglist and forgot to cc you.]
>>> 
>>> 
>>> Hi Kevin,
>>> 
>>> On 9/3/21 6:26 PM, Kevin Huck wrote:
>>>> Most versions of OTF2 (2.2 and lower, I believe) had an uninitialized 
>>>> variable that sometimes led to this error message and prematurely exited 
>>>> the initialization process, leading to other problems.  Which version of 
>>>> OTF2 are you using?
>>> 
>>> I used 2.2 but now switched to 2.3 and I applied this patch:
>>> 
>>> --- src/otf2_archive_int.c-org»·2021-09-06 11:27:07.439272261 +0200
>>> +++ src/otf2_archive_int.c»·2021-09-06 11:28:15.735032626 +0200
>>> @@ -1083,7 +1083,7 @@
>>>      archive->global_comm_context  = globalCommContext;
>>>      archive->local_comm_context   = localCommContext;
>>> 
>>> -    OTF2_ErrorCode status;
>>> +    OTF2_ErrorCode status = OTF2_SUCCESS;
>>> 
>>>      /* It is time to create the directories by the root rank. */
>>>      if ( archive->file_mode == OTF2_FILEMODE_WRITE )
>>> 
>>> This got rid of the error message, and a trace is now being generated. 
>>> Great! But I wonder whether the trace is correct. Vampir reports:
>>> 
>>> Event matching irregular - 79624 in total
>>> Pending messages - 120 in total
>>> Range violation - 1 in total
>>> 
>>> I posted a screenshot of the trace here:
>>> 
>>> https://surfdrive.surf.nl/files/index.php/s/MWbhZFPv733tgMX 
>>> <https://surfdrive.surf.nl/files/index.php/s/MWbhZFPv733tgMX><https://surfdrive.surf.nl/files/index.php/s/MWbhZFPv733tgMX
>>>  <https://surfdrive.surf.nl/files/index.php/s/MWbhZFPv733tgMX>>
>>> 
>>> I see 8 nested groups of 6 CPU threads, which is good. The numbering / 
>>> labeling is weird though. Each group of 6 CPUs is a process running on a 
>>> NUMA node.
>>> 
>>>>  It’s possible I have the wrong SLURM environment variables. Could you 
>>>> please do something like the following on your system (with a small test 
>>>> case) and see what you get?
>>>> `srun <srun arguments> env | grep SLURM`
>>> 
>>> My goal is to trace a job with HPX 8 processes on a single node. This node 
>>> contains 8 NUMA nodes, each containing 6 real cores.
>>> 
>>> salloc --partition=allq --nodes=1 --ntasks=8 --cpus-per-task=12 
>>> --cores-per-socket=6 env | grep SLURM
>>> 
>>> SLURM_SUBMIT_DIR=/quanta1/home/jong0137/development/project/lue
>>> SLURM_SUBMIT_HOST=login01.cluster
>>> SLURM_JOB_ID=3429945
>>> SLURM_JOB_NAME=env
>>> SLURM_JOB_NUM_NODES=1
>>> SLURM_JOB_NODELIST=node008
>>> SLURM_NODE_ALIASES=(null)
>>> SLURM_JOB_PARTITION=allq
>>> SLURM_JOB_CPUS_PER_NODE=96
>>> SLURM_JOBID=3429945
>>> SLURM_NNODES=1
>>> SLURM_NODELIST=node008
>>> SLURM_TASKS_PER_NODE=8
>>> SLURM_JOB_ACCOUNT=depfg
>>> SLURM_JOB_QOS=depfg
>>> SLURM_NTASKS=8
>>> SLURM_NPROCS=8
>>> SLURM_CPUS_PER_TASK=12
>>> SLURM_CLUSTER_NAME=cluster
>>> 
>>> 
>>> I use mpirun to start my HPX program. Not use if this is useful, but these 
>>> are the MPI variables set:
>>> 
>>> Each of the 8 processes prints these same values:
>>> 
>>> OMPI_APP_CTX_NUM_PROCS=8
>>> OMPI_COMM_WORLD_LOCAL_SIZE=8
>>> OMPI_COMM_WORLD_SIZE=8
>>> OMPI_FIRST_RANKS=0
>>> OMPI_UNIVERSE_SIZE=8
>>> 
>>> These are different per each of the 8 processes:
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=0
>>> OMPI_COMM_WORLD_NODE_RANK=0
>>> OMPI_COMM_WORLD_RANK=0
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=1
>>> OMPI_COMM_WORLD_NODE_RANK=1
>>> OMPI_COMM_WORLD_RANK=1
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=2
>>> OMPI_COMM_WORLD_NODE_RANK=2
>>> OMPI_COMM_WORLD_RANK=2
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=3
>>> OMPI_COMM_WORLD_NODE_RANK=3
>>> OMPI_COMM_WORLD_RANK=3
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=4
>>> OMPI_COMM_WORLD_NODE_RANK=4
>>> OMPI_COMM_WORLD_RANK=4
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=5
>>> OMPI_COMM_WORLD_NODE_RANK=5
>>> OMPI_COMM_WORLD_RANK=5
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=6
>>> OMPI_COMM_WORLD_NODE_RANK=6
>>> OMPI_COMM_WORLD_RANK=6
>>> 
>>> OMPI_COMM_WORLD_LOCAL_RANK=7
>>> OMPI_COMM_WORLD_NODE_RANK=7
>>> OMPI_COMM_WORLD_RANK=7
>>> 
>>> 
>>> Thanks for looking into this!
>>> 
>>> Kor
>>> 
>> --
>> Kevin Huck, PhD
>> Research Associate / Computer Scientist
>> OACISS - Oregon Advanced Computing Institute for Science and Society
>> University of Oregon
>> [email protected] <mailto:[email protected]> 
>> <mailto:[email protected] <mailto:[email protected]>>
>> http://tau.uoregon.edu <http://tau.uoregon.edu/>
>> http://oaciss.uoregon.edu <http://oaciss.uoregon.edu/> 
>> <http://oaciss.uoregon.edu <http://oaciss.uoregon.edu/>>

--
Kevin Huck, PhD
Research Associate / Computer Scientist
OACISS - Oregon Advanced Computing Institute for Science and Society
University of Oregon
[email protected]
http://tau.uoregon.edu
http://oaciss.uoregon.edu

_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Re: [hpx-users] Generate OTF2 traces of distributed runs with APEX, re-revisited

Reply via email to