Thanks a lot for your suggestion of using `serialize_buffer`. I will try
to implement it using a custom allocator, which allocates the correct
type of pinned memory depending on the capabilities of the node (i.e.
using cudaMallocHost() if there is a GPU and malloc() + mlock()
otherwise). Modifying the `InputData` type is not a problem here.
John's `rma_object<>` seems very interesting as well, and is probably
the way to go in the long term. I will try to dig into the code and look
at his implementation of a pinned allocator to see how I could adapt it
to my use case.
If you are interested, I can keep you updated once I have a working
prototype (even if it is not zero-copy yet).
hpx-users mailing list