WHAT: Should we make the job size (i.e., initial number of procs) available in 
OPAL?

WHY: At least 2 BTLs are using this info (*more below)

WHERE: usnic and ugni

TIMEOUT: there's already been some inflammatory emails about this; let's 
discuss next Tuesday on the teleconf: Tue, 5 Aug 2014

MORE DETAIL:

This is an open question.  We *have* the information at the time that the BTLs 
are initialized: do we allow that information to go down to OPAL?

Ralph added this info down in OPAL in r32355, but George reverted it in r32361.

Points for: YES, WE SHOULD
+++ 2 BTLs were using it (usinc, ugni)
+++ Other RTE job-related info are already in OPAL (num local ranks, local rank)

Points for: NO, WE SHOULD NOT
--- What exactly is this number (e.g., num currently-connected procs?), and 
when is it updated?
--- We need to precisely delineate what belongs in OPAL vs. above-OPAL

FWIW: here's how ompi_process_info.num_procs was used before the BTL move down 
to OPAL:

- usnic: for a minor latency optimization / sizing of a shared receive buffer 
queue length, and for the initial size of a peer lookup hash
- ugni: to determine the size of the per-peer buffers used for send/recv 
communication

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to