>BTW, are you sure about the units on fig. 3 ? 4 seconds to serialize
20KB of data in not especially High Performance... unless of course you
were running HPX on a toaster :-)

Crap! the units should not be seconds. I will fix my local copy and contact the 
publishers in case there is time to have it changed.

Regarding the use of different allocators - there is a virtual base class for 
the allocator/pool that provides the memory so that we can use different 
allocators for the ibverbs and libfabric implementations, so it ought to be ok 
to provide an hpx::compute::gpu::allocator and allow it to hand out memory for 
the rma objects. providing the received data (over the network?) isn't passed 
to a compute host with different pinning requirement, then all is fine. if 
received data is in a network rma buffer and then passed directly to the gpu 
transfer, we might need a way to pin the memory using both apis for network and 
gpu, but that can all be handled in the allocator abstraction itself which is 
generic and easy to extend.

Once I work on this again, I'll let you know and if you have specific features 
you want to try out, I can work with you - sorry if my current deadlines push 
this out too far for you.

Should you want to have a go at it - there is an rma_object branch on github 
that I need to get merged into master sooner rather than later.

hpx-users mailing list

Reply via email to