Jean-Loup

This is something I've been working towards, but unfortunately the code is not 
quite ready for you yet. I have begun work on rma_object<T> types that make use 
of a custom allocator to make use of pinned memory and so far I have integrated 
it into the libfabric parcelport, but I have not finished with it yet.
The current status is described in this paper 
ftp://ftp.cscs.ch/out/biddisco/hpx/Applied-Computing-HPX-ZeroCopy.pdf and I 
think when you see the description, you'll want to use it for your gpu data.

in principle, it ought to be fairly straightforward to make it work, but in 
practice it will require quite a bit of poking around in the hpx internals to 
get it going. If you are not desperate and can wait a few months, then I will 
be resuming this work in september with extensions to the rma_object types so 
that you can perform put/get operations on remote nodes with them directly, 
rather than invoking a 'copy' action. For the gpu, we would map the rma.put/get 
onto a cuda copy operation using the pinned memory of the underlying object.

overloading the rma_object types to use a different allocator - taken from the 
gpu - would actually be fairly easy, but I have not looked at the action 
handling for gpu targets, so I'd need to ponder that.

Bsaically - the answer to your question is - there will be a way to do what you 
want - but not yet.

JB



________________________________________
From: [email protected] 
[[email protected]] on behalf of Jean-Loup Tastet 
[[email protected]]
Sent: 07 August 2017 14:54
To: [email protected]
Cc: Felice Pantaleo
Subject: [hpx-users] Receiving action arguments on pinned memory

Hi all,

I am currently trying to use HPX to offload computationally intensive tasks to 
remote GPU nodes. In idiomatic HPX, this would typically be done by invoking a 
remote action:

    OutputData compute(InputData input_data)
    {
        /* Asynchronously copy `input_data` to device using DMA */
        /* Do work on GPU */
        /* Copy back the results to host */
        return results;
    }

    HPX_PLAIN_ACTION(compute, compute_action);

    // In sender code
    auto fut = hpx::async(compute_action(), remote_locality_with_gpu, 
std::move(input_data));

So far, so good.

However, an important requirement is that the memory allocated for the input 
data on the receiver end be pinned, to enable asynchronous copy between the 
host and the GPU. This can of course always be done by copying the argument 
`input_data` to pinned memory within the function body, but I would prefer to 
avoid any superfluous copy in order to minimize the overhead.

Do you know if it is possible to control within HPX where the memory for the 
input data will be allocated (on the receiver end) ? I tried to use the 
`pinned_allocator` from the Thrust library for the data members of `InputData`, 
and although it did its job as expected, it also requires to allocate pinned 
memory on the sender side (for the construction of the object), as well as the 
presence of the Thrust library and the CUDA runtime on both machines. This led 
me to think that there should be a better way.

Ideally, I would be able to directly deserialize the incoming data into pinned 
memory. Do you know if there is a way to do this or something similar in HPX ? 
If not, do you think it is possible to emulate such functionality by directly 
using the low-level constructs / internals of HPX ? This is for a prototype, so 
it is okay to use unstable / undocumented code as long as it allows me to prove 
the feasibility of the approach.

I would greatly appreciate any input / suggestions on how to approach this 
issue. If anyone has experience using HPX with GPUs or on heterogeneous 
clusters, I would be very interested in hearing about it as well.

Best regards,
Jean-Loup Tastet
_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users
_______________________________________________
hpx-users mailing list
[email protected]
https://mail.cct.lsu.edu/mailman/listinfo/hpx-users

Reply via email to