
> I was wondering whether anyone could provide some direction as to the
> recommended approach to incorporating GPU-based execution within an HPX
> application, either in general or, preferably if possible, for how I
> (think I) want to use it-
> Given a number of host systems (CPU), each possibly containing one or more
> GPGPUs.  For the sake of simplicity, right now all of my GPUs are NVidia.
> The specific application I'm working on is a distributed optimizer
> implementing a global search algorithm (right now, I have implemented
> Differential Evolution).  In its distributed form, it treats each node as
> a separate island with occasional migration.
> When run on separate SLURM / HPX (CPU-based) nodes, each node has a set of
> N trial parameter vectors.  Each node iterates independently.  When
> certain conditions are met, a node can initiate a migration of data from a
> logically-adjacent node.
> Given the feature sets of the latest versions of CUDA, I believe that it
> should be possible for me to treat each GPGPU as a node as well.  I
> realize it won't be a native node via SLURM (as an Intel Phi node might
> be), but rather initialized in some other way after the application is
> initialized across the cluster).
> I've seen the HPXCL project repo, but it hasn't had any updates in more
> than 4 months.  I've also seen at one time, though I cannot find right
> now, scattered bits of code around the internet with a couple different
> approaches.  Finally, I am under the impression that work is being done on
> executors to provide this or related functionality.
> Given, I think, that this should be a reasonably common use-case, it might
> be helpful if some official tutorials are generated at some point to
> assist users with this.  In the meanwhile, if anyone (and everyone) can
> provide some general direction or guidance on this, I would be greatly
> appreciative.  I would, also, be very willing to try and provide some
> additional tutorials and user documentation for this application.
> Thank you and I would greatly appreciate any help that you can provide.

We've had several attempts to providing higher level support to integrate 
GPGPUs with HPX. HPXCL is one of those; HPX.Compute is another attempt. 
Neither has resulted in something we're satisfied with. However, we have 
created a couple of lower level facilities that I believe are useful.

The main idea is (as always) to expose operations the CPU schedules on the 
device (data transfer, run a kernel, etc.) through API function that return an 
hpx::future. That allows for nicely hiding latencies and communication 
overheads. This also allows integrating the GPU work into the overall 
asynchronous execution flow on the CPU.

HPX.Compute has also introduced the concept of 'targets' (i.e. places in the 
system) that can be used to create a) allocators, and b) executors. This is 
important to be able to control data and execution placement from user land 
while still using system facilities like parallel algorithms, etc.

HPXCL has solved the problem of remote GPU access by encapsulating a device in 
a HPX component, that allows to submitting data and kernels for execution to a 
remote device.

There is essentially no documentation of any of the above. As I said, we were 
not satisfied with the overall design or implementation, so we decided not to 
spend time on describing things.

As always, any help you might be willing to give would be highly appreciated.

Regards Hartmut

hpx-users mailing list

Reply via email to