On 02/04/2014 19:44, Kevin Greenan wrote: > Hey Loic, > > Are you ensuring that Jerasure (actually gf-complete) is getting memory > buffers aligned on 16-byte boundaries? Without looking too deep, that is the > first thing I would check. >
Yes https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L32 https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L242 https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L65 https://github.com/ceph/ceph/blob/master/src/erasure-code/jerasure/ErasureCodeJerasure.cc#L108 I'll re-read this logic tomorrow just to be sure. Cheers > I can have a deeper look later today or tomorrow. > > -kevin > > > On Wed, Apr 2, 2014 at 10:35 AM, Loic Dachary <[email protected] > <mailto:[email protected]>> wrote: > > Hi Kevin, > > In the context of http://tracker.ceph.com/issues/7914 we're trying to > figure out why jerasure dumps core. We don't know how to reproduce it yet > (ran dozens of identical tests suites with no such crash in the past few > days, which is to be expected for rare bugs because the test suite introduces > random errors / failures on purpose). > > The full stack trace is at http://tracker.ceph.com/issues/7914#note-24 > but the relevant part is here: > > #0 0x00007f4756779b7b in raise (sig=<optimized out>) at > ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42 > #1 0x0000000000981b4e in reraise_fatal (signum=11) at > global/signal_handler.cc:59 > #2 handle_fatal_signal (signum=11) at global/signal_handler.cc:105 > #3 <signal handler called> > #4 0x0000000000000000 in ?? () > #5 0x00007f47385ae6b1 in jerasure_matrix_dotprod (k=2, w=8, > matrix_row=0x31513a8, src_ids=0x0, dest_id=<optimized out>, > data_ptrs=0x7f4741ec7a00, coding_ptrs=0x7f4741ec7a10, > size=2048) at erasure-code/jerasure/jerasure/src/jerasure.c:607 > #6 0x00007f47385ae7d6 in jerasure_matrix_encode (k=2, m=1, w=8, > matrix=<optimized out>, data_ptrs=0x7f4741ec7a00, coding_ptrs=0x7f4741ec7a10, > size=2048) > at erasure-code/jerasure/jerasure/src/jerasure.c:310 > ... > > Note that this jerasure/gf-complete combination has been compiled with > SSE4.1, SSE4.2, PCLMUL, SSSE3, SSE3, SSE2, SSE flags activated. These are > jerasure v2 and gf-complete v1, only slightly modified as found in > https://github.com/ceph/jerasure/tree/v2-ceph and > https://github.com/ceph/gf-complete/tree/v1-ceph (all commits there have a > pending pull request under https://bitbucket.org/jimplank/gf-complete > https://bitbucket.org/jimplank/jerasure, nothing you've not seen before). > > #5 is https://github.com/ceph/jerasure/blob/v2-ceph/src/jerasure.c#L607 > > and then it dives into gf-complete and most probably destroyed part of > the stack when corrupting memory. I'll be chasing this tomorrow. If you have > a brilliant idea on why that happens, I'll take it ;-) > > Cheers > > -- > Loïc Dachary, Artisan Logiciel Libre > > -- Loïc Dachary, Artisan Logiciel Libre
signature.asc
Description: OpenPGP digital signature
