Thank you very much, Roy and Cody. I did some tests for ex4 in libmesh. When using a CPU coure, the cost time for solver and matrix assembly is as follows: | PetscLinearSolver | | solve() 1 69.5099 69.509936 69.5099 69.509936 75.78 75.78 | | | | System | | assemble() 1 14.3838 14.383849 18.0873 18.087271 15.68 19.72 | ------------------------------------------------------------------------------------------------------------ | Totals: 22294 91.7241 100.00 | ------------------------------------------------------------------------------------------------------------
When using a GPU card, the time is as follows: | PetscLinearSolver | | solve() 1 28.6089 28.608886 28.6089 28.608886 3.62 3.62 | | | | System | | assemble() 1 752.7547 752.754713 756.7779 756.777857 95.37 95.88 | ------------------------------------------------------------------------------------------------------------ | Totals: 22294 789.3232 100.00 | ------------------------------------------------------------------------------------------------------------ To my understanding, two possible reasons for very slow matrix assembly in GPU are 1. there is not memory preallocation 2. frequently memory operations for GPU As Jed said, I am not sure whether there is memory preallocation in libMesh. If you did it, the first reason can be removed. Thanks a lot, Yujie On 2/13/12, Cody Permann <codyperm...@gmail.com> wrote: > On Mon, Feb 13, 2012 at 11:01 AM, Roy Stogner > <royst...@ices.utexas.edu>wrote: > >> >> On Mon, 13 Feb 2012, recrusader wrote: >> >> I am not sure that Jed mentioned the following stuff to you. >>> Now, I can realize GPU computation with libMesh and PETSc in one or two >>> cards. >>> However, compared to CPU implementation, matrix assembly using GPU is >>> very slow. >>> >> > This is to be expected - if you think about what happens in assembly we are > looping over elements and building up just a "few" values at a time for our > local element matrices. We then scatter those into the PETSc vector. The > startup cost and speed of the GPU processors means that you need to be > doing hundreds or thousands of computations to overcome the memory transfer > costs. I've done preliminary work in libMesh showing that this "can" > be efficiently but it is non-trivial and will require significant changes > in the way these algorithms are designed. > > I'd like to get back to this problem but we (MOOSE) have been swamped with > "real" work :) Are you looking to do research in this area or just hoping > for a quick speedup of your existing problems? > > >> I take it the problem is that when you switch the matrix to a >> GPU-based type, the sparsity data is lost? >> > > Like I said above the problem is a little more complicated than that. You > need to consider amount of work to be done per GPU call which as libMesh is > currently structured, is way too low. > > Cody > > >> Is it easy to add the following codes in libMesh? >>> >> >> I have no idea. Forwarding to libmesh-devel, in case any of the more >> PETSc-literate developers can say more. >> --- >> Roy >> >> Thank you very much, >>> Yujie >>> >>> ---------- Forwarded message ---------- >>> From: Jed Brown <jedbr...@mcs.anl.gov> >>> Date: Sat, Feb 11, 2012 at 11:06 AM >>> Subject: Re: [petsc-users] MatSetValues() for MATMPICUSP >>> To: PETSc users list <petsc-us...@mcs.anl.gov> >>> >>> >>> On Sat, Feb 11, 2012 at 10:58, recrusader <recrusa...@gmail.com> wrote: >>> " ierr = MatCreateMPIAIJ (libMesh::COMM_WORLD, >>> >>> m_local, n_local, >>> ** m_global, n_global, >>> ** PETSC_NULL, (int*) &n_nz[0], >>> ** PETSC_NULL, (int*) &n_oz[0], &_mat); >>> CHKERRABORT(libMesh::COMM_**WORLD,ierr); >>> >>> MatSetOption(_mat,MAT_NEW_**NONZERO_ALLOCATION_ERR,PETSC_**FALSE); >>> //by Yujie >>> std::cout<<"MatSetOption"<<**std::endl;" >>> >>> I run the same codes in CPU and GPU modes (the same parameters except >>> that GPU uses '-vec_type mpicusp -mat_type >>> mpiaijcusp'). I can find "MatSetOption" output from both the modes. Does >>> that mean that the codes set the options for both >>> the modes? >>> >>> >>> Libmesh is calling MatSetFromOptions() after MatCreateMPIAIJ() which >>> means the preallocation information will be lost if the >>> type is changed. The code should be written differently >>> ierr = MatCreate(comm,A);CHKERRQ(**ierr); >>> ierr = MatSetSizes(*A,m,n,M,N);**CHKERRQ(ierr); >>> ierr = MPI_Comm_size(comm,&size);**CHKERRQ(ierr); >>> ierr = MatSetType(*A,MATAIJ);CHKERRQ(**ierr); >>> ierr = MatSetOptionsPrefix(*A,**optional_prefix);CHKERRQ(ierr)**; >>> ierr = MatSetFromOptions(*A);CHKERRQ(**ierr); >>> ierr = MatMPIAIJSetPreallocation(*A,**d_nz,d_nnz,o_nz,o_nnz);** >>> CHKERRQ(ierr); >>> ierr = MatSeqAIJSetPreallocation(*A,**d_nz,d_nnz);CHKERRQ(ierr); >>> >>> I can talk to the libmesh developers about making this change. >> >> >> >> ------------------------------------------------------------------------------ >> Try before you buy = See our experts in action! >> The most comprehensive online learning library for Microsoft developers >> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, >> Metro Style Apps, more. Free future releases when you subscribe now! >> http://p.sf.net/sfu/learndevnow-dev2 >> _______________________________________________ >> Libmesh-devel mailing list >> Libmesh-devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/libmesh-devel >> >> > ------------------------------------------------------------------------------ Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 _______________________________________________ Libmesh-devel mailing list Libmesh-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-devel