I developed a code based on libMesh to optimize components subjected to extreme thermal loads. I use a gradient-based optimizer, and the optimization problem has lots of constraint functions, so I have to perform a direct sensitivity analysis where the two systems (thermal and static) both have to be solved once for each design variable. There are many design variables, so, for the sake of efficiency, I solve the system with all of the RHS vectors packed into a matrix. Previously, I did this using PETSc and SuperLU_DIST. For the sake of maintainability and efficiency, I am replacing PETSc with Eigen at the moment, and have run into an issue that I can't understand.
The systems are evaluated using 4 separate functions (this is relevant later): 1) Solve temperatures 2) Solve displacements 3) Differentiate temperatures 4) Differentiate displacements The solve steps use libMesh's interface and internal Eigen solver. Since this can't be used to solve multiple RHS vectors at once, in each of the differentiation functions I make a copy or const reference of the system matrix using a mat() method I added to EigenSparseMatrix. I then create my own Eigen solver, pack the RHS vectors for the sensitivity analysis into a matrix and solve. I'm testing different variants; some of them work just fine, but some of them crash, and I see no logical reason why. - If I use an Eigen::BiCGSTAB<EigenSM> solver for steps #3 and #4, everything works as expected, even if I use the matrix copy, which has a column-major storage order. - If I change the solver in step #4 to Eigen::SparseLU, there is a significant efficiency improvement, and everything works fine. - However, if I change the solver in step #3 to Eigen::SparseLU, or even just to Eigen::BiCGSTAB<Eigen::SparseMatrix<Number>>, I get the following error message and the code crashes: double free or corruption (out) [e2mbacur2018li:27024] *** Process received signal *** [e2mbacur2018li:27024] Signal: Aborted (6) [e2mbacur2018li:27024] Signal code: (-6) [e2mbacur2018li:27024] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fc2af29b890] [e2mbacur2018li:27024] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fc2aeed6e97] [e2mbacur2018li:27024] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fc2aeed8801] [e2mbacur2018li:27024] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7fc2aef21897] [e2mbacur2018li:27024] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7fc2aef2890a] [e2mbacur2018li:27024] [ 5] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x525)[0x7fc2aef2fe75] [e2mbacur2018li:27024] [ 6] /usr/local/lib/libmesh_opt.so.0(_ZN5Eigen8internal8bicgstabINS_3RefIKNS_12SparseMatrixIdLi1EiEELi0ENS_11OuterStrideILin1EEEEENS_5BlockIKNS_6MatrixIdLin1ELi1ELi0ELin1ELi1EEELin1ELi1ELb1EEENS9_ISB_Lin1ELi1ELb1EEENS_22DiagonalPreconditionerIdEEEEbRKT_RKT0_RT1_RKT2_RlRNSN_10RealScalarE+0x5a5a)[0x7fc2b11c3e8a] [e2mbacur2018li:27024] [ 7] /usr/local/lib/libmesh_opt.so.0(_ZN7libMesh23EigenSparseLinearSolverIdE5solveERNS_12SparseMatrixIdEERNS_13NumericVectorIdEES7_dj+0x1946)[0x7fc2b11d7436] [e2mbacur2018li:27024] [ 8] /usr/local/lib/libmesh_opt.so.0(_ZN7libMesh20LinearImplicitSystem5solveEv+0x40d)[0x7fc2b1233f7d] [e2mbacur2018li:27024] [ 9] ./mspfc_td(+0xd635c)[0x557868ecb35c] [e2mbacur2018li:27024] [10] ./mspfc_td(+0xcdf39)[0x557868ec2f39] [e2mbacur2018li:27024] [11] ./mspfc_td(+0x9a63b)[0x557868e8f63b] [e2mbacur2018li:27024] [12] ./mspfc_td(+0x2229a)[0x557868e1729a] [e2mbacur2018li:27024] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fc2aeeb9b97] [e2mbacur2018li:27024] [14] ./mspfc_td(+0x2381a)[0x557868e1881a] [e2mbacur2018li:27024] *** End of error message *** Aborted (core dumped) Valgrind confirms the confusing part: the choice of solver in the function for step #3 (even the choice of the matrix type template parameter), causes a memory error in the solver used in step #1. Would you guys have any idea why changing the type of an Eigen solver external to the libMesh library would cause a memory error in the internal solver before the external solver even comes into scope? I'd appreciate any insight. For some reason, the choice of solver in step #4 seems to have no connection to the issue at all. Below is an excerpt from valgrind. Regards, Bailey ==27128== Invalid free() / delete / delete[] / realloc() ==27128== at 0x4C30D3B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27128== by 0x57C0E89: bool Eigen::internal::bicgstab<Eigen::Ref<Eigen::SparseMatrix<double, 1, int> const, 0, Eigen::OuterStride<-1> >, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1> const, -1, 1, true>, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1>, -1, 1, true>, Eigen::DiagonalPreconditioner<double> >(Eigen::Ref<Eigen::SparseMatrix<double, 1, int> const, 0, Eigen::OuterStride<-1> > const&, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1> const, -1, 1, true> const&, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1>, -1, 1, true>&, Eigen::DiagonalPreconditioner<double> const&, long&, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1>, -1, 1, true>::RealScalar&) (in /usr/local/lib/libmesh_opt.so.0.0.0) ==27128== by 0x57D4435: libMesh::EigenSparseLinearSolver<double>::solve(libMesh::SparseMatrix<double>&, libMesh::NumericVector<double>&, libMesh::NumericVector<double>&, double, unsigned int) (in /usr/local/lib/libmesh_opt.so.0.0.0) ==27128== by 0x5830F7C: libMesh::LinearImplicitSystem::solve() (in /usr/local/lib/libmesh_opt.so.0.0.0) ==27128== by 0x1DE35B: MinStressPFC_TD::_solve_thermal() (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x1D5F38: MinStressPFC::eval_sys_equations(unsigned int, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> >&) (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x1A263A: OptimizerMMA_eigen::iterate() (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x12A299: main (in /home/bacur/minstresspfc/mspfc_td) ==27128== Address 0x1905ee00 is 32 bytes inside a block of size 2,880 alloc'd ==27128== at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==27128== by 0x16C97C: Eigen::internal::aligned_malloc(unsigned long) (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x1BF31B: Eigen::PlainObjectBase<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::resize(long, long) (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x57BBBC6: bool Eigen::internal::bicgstab<Eigen::Ref<Eigen::SparseMatrix<double, 1, int> const, 0, Eigen::OuterStride<-1> >, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1> const, -1, 1, true>, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1>, -1, 1, true>, Eigen::DiagonalPreconditioner<double> >(Eigen::Ref<Eigen::SparseMatrix<double, 1, int> const, 0, Eigen::OuterStride<-1> > const&, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1> const, -1, 1, true> const&, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1>, -1, 1, true>&, Eigen::DiagonalPreconditioner<double> const&, long&, Eigen::Block<Eigen::Matrix<double, -1, 1, 0, -1, 1>, -1, 1, true>::RealScalar&) (in /usr/local/lib/libmesh_opt.so.0.0.0) ==27128== by 0x57D4435: libMesh::EigenSparseLinearSolver<double>::solve(libMesh::SparseMatrix<double>&, libMesh::NumericVector<double>&, libMesh::NumericVector<double>&, double, unsigned int) (in /usr/local/lib/libmesh_opt.so.0.0.0) ==27128== by 0x5830F7C: libMesh::LinearImplicitSystem::solve() (in /usr/local/lib/libmesh_opt.so.0.0.0) ==27128== by 0x1DE35B: MinStressPFC_TD::_solve_thermal() (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x1D5F38: MinStressPFC::eval_sys_equations(unsigned int, std::vector<double, std::allocator<double> > const&, std::vector<double, std::allocator<double> >&, std::vector<double, std::allocator<double> >&) (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x1A263A: OptimizerMMA_eigen::iterate() (in /home/bacur/minstresspfc/mspfc_td) ==27128== by 0x12A299: main (in /home/bacur/minstresspfc/mspfc_td) _______________________________________________ Libmesh-users mailing list Libmesh-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/libmesh-users