Hi, Is there any algorithm/high level description on how the shared memory(vader) btl components work? I'm wondering about the part that does CICO using shared memory, not the part that uses single copy with kernel support. How does the collective communication algorithm use the vader btl? Thanks!
Best, Zhiting