Re: [deal.II] Re: about deal.II with CUDA C programming acceleration

Chih-Che Chueh Wed, 16 Aug 2017 01:55:02 -0700

Bruno,

It is nice to meet you again. I still remember we met and chatted a little
bit in 2013 deal.II workshop, which was held in Texas A&M University.


Thank you for letting me be involved in this development. Actually, once I
really get started after the installation of the development version of
deal.II in the server I am using, I am sure that I will have many questions
to you about CUDA site of deal.II, and I will need your helps.

After I just take a glimpse at this to try to understand the CUDA style you
tried to implement, I have a quick question to you:

When you did test the CUDA code, you used cudaMalloc to create an object on
the device (GPU) and then copy the results from the device to the host
using cudaMemcpy. As far as I know, now CUDA offers a simple way
using Unified Memory, which is providing a single memory space accessible
by all GPUs and CPUs in your system with the efficient page migration
engine of the recently released NVIDIA Tesla P100. This implementation is
like the following CUDA code.

=================Unified Memory==========================
  int N = 10000;
  float *x, *y;

  // Allocate Unified Memory – accessible from CPU or GPU
  cudaMallocManaged(&x, N*sizeof(float));
  cudaMallocManaged(&y, N*sizeof(float));

  // initialize x and y arrays on the host
  for (int i = 0; i < N; i++) {
    x[i] = 1.0f;
    y[i] = 2.0f;
  }

  add<<<1, 1>>>(N, x, y);

  // Free memory
  cudaFree(x);
  cudaFree(y);
=======================================================

You might also want to take a look at this about unified memory:

https://devblogs.nvidia.com/parallelforall/cuda-8-features-revealed/

Is there any reason why you still used the previous way? Is this because
your GPU accelerator was launched some years ago so that you are unable to
use the new simple way?

Thanks!

Sincerely,

Chih-Che


On Tue, Aug 15, 2017 at 7:19 PM, Bruno Turcksin <[email protected]>
wrote:

> Chih-Che,
>
> The CUDA support in deal.II is very new. It is only in the development
> version of deal (https://github.com/dealii/dealii). You can see our
> current development plan here https://github.com/dealii/dealii/projects/2
> Right now, we have support for vector and partial support for matrix-free
> (a good place to see the capabilities is the test suite
> https://github.com/dealii/dealii/tree/master/tests/cuda). I would advise
> you to wait for this PR https://github.com/dealii/dealii/pull/4846 to be
> merged before your try to install deal with CUDA. This PR makes it a lot
> easier to install deal with CUDA. If you want to work on CUDA, you should
> work on something that you like / is of interest to you. If you want to
> help our existing effort, you can pick something from this list
> https://github.com/dealii/dealii/issues/4399. I am working on the first
> item but let me know if you find anything else interesting, I can help you
> implementing it.
>
> If you have any questions, please ask. We are looking for people to help
> us with CUDA.
>
> Best,
>
> Bruno
>
>
> On Monday, August 14, 2017 at 7:56:45 AM UTC-4, Chih-Che Chueh wrote:
>>
>> Dear deal.II developers and users,
>>
>> Recently, I spent some spare time assimilating CUDA C programming in the
>> last few months, and I already know very well how to use CUDA stream
>> events to let CPU and kernel (GPU) execution work asynchronously with
>> efficiently overlapping data transfer between CPU and GPU, how to use
>> shared memory to ensure global memory coalescing efficiently, how to map
>> threads to matrix elements either using CARTESIAN x, y, z or a row/column
>> mapping in GPU, as well as how to use shared memory to enhance data
>> reuse. Most importantly, for actual practice, we have a GPU accelerator
>> (i.e. NVIDIA Tesla K40) that was bought last year. I plan to use the CUDA C
>> programming to deal with big data or image identification with artificial
>> intelligence (deep learning) for atmospheric data.
>>
>> Anyway, I am writing to ask if I could get involved with a deal.II
>> project of people who are working on asynchronous adaptive mesh refinement
>> for acceleration or other performance improvement in deal.II with CUDA C
>> programming.
>>
>> Thanks!
>>
>> Sincerely,
>>
>> Chih-Che
>>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see https://groups.google.com/d/
> forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [deal.II] Re: about deal.II with CUDA C programming acceleration

Reply via email to