Thank you very much, Roy and Cody.
I did some tests for ex4 in libmesh.
When using a CPU coure, the cost time for solver and matrix assembly
is as follows:
| PetscLinearSolver
|
| solve() 1 69.5099 69.509936
69.5099 69.509936 75.78 75.78 |
|
|
| System
|
| assemble() 1 14.3838 14.383849
18.0873 18.087271 15.68 19.72 |
------------------------------------------------------------------------------------------------------------
| Totals: 22294 91.7241
100.00 |
------------------------------------------------------------------------------------------------------------
When using a GPU card, the time is as follows:
| PetscLinearSolver
|
| solve() 1 28.6089 28.608886
28.6089 28.608886 3.62 3.62 |
|
|
| System
|
| assemble() 1 752.7547 752.754713
756.7779 756.777857 95.37 95.88 |
------------------------------------------------------------------------------------------------------------
| Totals: 22294 789.3232
100.00 |
------------------------------------------------------------------------------------------------------------
To my understanding, two possible reasons for very slow matrix
assembly in GPU are
1. there is not memory preallocation
2. frequently memory operations for GPU
As Jed said, I am not sure whether there is memory preallocation in
libMesh. If you did it, the first reason can be removed.
Thanks a lot,
Yujie
On 2/13/12, Cody Permann <[email protected]> wrote:
> On Mon, Feb 13, 2012 at 11:01 AM, Roy Stogner
> <[email protected]>wrote:
>
>>
>> On Mon, 13 Feb 2012, recrusader wrote:
>>
>> I am not sure that Jed mentioned the following stuff to you.
>>> Now, I can realize GPU computation with libMesh and PETSc in one or two
>>> cards.
>>> However, compared to CPU implementation, matrix assembly using GPU is
>>> very slow.
>>>
>>
> This is to be expected - if you think about what happens in assembly we are
> looping over elements and building up just a "few" values at a time for our
> local element matrices. We then scatter those into the PETSc vector. The
> startup cost and speed of the GPU processors means that you need to be
> doing hundreds or thousands of computations to overcome the memory transfer
> costs. I've done preliminary work in libMesh showing that this "can"
> be efficiently but it is non-trivial and will require significant changes
> in the way these algorithms are designed.
>
> I'd like to get back to this problem but we (MOOSE) have been swamped with
> "real" work :) Are you looking to do research in this area or just hoping
> for a quick speedup of your existing problems?
>
>
>> I take it the problem is that when you switch the matrix to a
>> GPU-based type, the sparsity data is lost?
>>
>
> Like I said above the problem is a little more complicated than that. You
> need to consider amount of work to be done per GPU call which as libMesh is
> currently structured, is way too low.
>
> Cody
>
>
>> Is it easy to add the following codes in libMesh?
>>>
>>
>> I have no idea. Forwarding to libmesh-devel, in case any of the more
>> PETSc-literate developers can say more.
>> ---
>> Roy
>>
>> Thank you very much,
>>> Yujie
>>>
>>> ---------- Forwarded message ----------
>>> From: Jed Brown <[email protected]>
>>> Date: Sat, Feb 11, 2012 at 11:06 AM
>>> Subject: Re: [petsc-users] MatSetValues() for MATMPICUSP
>>> To: PETSc users list <[email protected]>
>>>
>>>
>>> On Sat, Feb 11, 2012 at 10:58, recrusader <[email protected]> wrote:
>>> " ierr = MatCreateMPIAIJ (libMesh::COMM_WORLD,
>>>
>>> m_local, n_local,
>>> ** m_global, n_global,
>>> ** PETSC_NULL, (int*) &n_nz[0],
>>> ** PETSC_NULL, (int*) &n_oz[0], &_mat);
>>> CHKERRABORT(libMesh::COMM_**WORLD,ierr);
>>>
>>> MatSetOption(_mat,MAT_NEW_**NONZERO_ALLOCATION_ERR,PETSC_**FALSE);
>>> //by Yujie
>>> std::cout<<"MatSetOption"<<**std::endl;"
>>>
>>> I run the same codes in CPU and GPU modes (the same parameters except
>>> that GPU uses '-vec_type mpicusp -mat_type
>>> mpiaijcusp'). I can find "MatSetOption" output from both the modes. Does
>>> that mean that the codes set the options for both
>>> the modes?
>>>
>>>
>>> Libmesh is calling MatSetFromOptions() after MatCreateMPIAIJ() which
>>> means the preallocation information will be lost if the
>>> type is changed. The code should be written differently
>>> ierr = MatCreate(comm,A);CHKERRQ(**ierr);
>>> ierr = MatSetSizes(*A,m,n,M,N);**CHKERRQ(ierr);
>>> ierr = MPI_Comm_size(comm,&size);**CHKERRQ(ierr);
>>> ierr = MatSetType(*A,MATAIJ);CHKERRQ(**ierr);
>>> ierr = MatSetOptionsPrefix(*A,**optional_prefix);CHKERRQ(ierr)**;
>>> ierr = MatSetFromOptions(*A);CHKERRQ(**ierr);
>>> ierr = MatMPIAIJSetPreallocation(*A,**d_nz,d_nnz,o_nz,o_nnz);**
>>> CHKERRQ(ierr);
>>> ierr = MatSeqAIJSetPreallocation(*A,**d_nz,d_nnz);CHKERRQ(ierr);
>>>
>>> I can talk to the libmesh developers about making this change.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Try before you buy = See our experts in action!
>> The most comprehensive online learning library for Microsoft developers
>> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
>> Metro Style Apps, more. Free future releases when you subscribe now!
>> http://p.sf.net/sfu/learndevnow-dev2
>> _______________________________________________
>> Libmesh-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/libmesh-devel
>>
>>
>
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Libmesh-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-devel