Control: tags -1 confirmed Control: reassign -1 salmon Control: affects -1 pigx-rnaseq Control: retitle -1 salmon call causing pigx-rnaseq to infinitely loop
Hi,
Lucas Nussbaum, on 2023-01-14:
> > /usr/bin/Rscript --vanilla /<<PKGBUILDDIR>>/scripts/runDeseqReport.R
> > --logo=/<<PKGBUILDDIR>>/images/Logo_PiGx.png --prefix='analysis1'
> > --reportFile=/<<PKGBUILDDIR>>/scripts/deseqReport.Rmd
> > --countDataFile=/<<PKGBUILDDIR>>/tests/output/feature_counts/raw_counts/hisat2/counts.tsv
> > --colDataFile=/<<PKGBUILDDIR>>/tests/output/colData.tsv
> > --gtfFile=/<<PKGBUILDDIR>>/tests/sample_data/sample.gtf
> > --caseSampleGroups='HBR' --controlSampleGroups='UHR' --covariates=''
> > --workdir=/<<PKGBUILDDIR>>/tests/output/report/hisat2 --organism=''
> > --description='This analysis is part of the pigx-rnaseq build-time tests.'
> > --selfContained='True' >>
> > /<<PKGBUILDDIR>>/tests/output/logs/hisat2/analysis1.report.log 2>&1
> > [Fri Jan 13 20:26:39 2023]
> > Finished job 38.
> > 35 of 48 steps (73%) done
> > [Fri Jan 13 20:26:53 2023]
> > Finished job 45.
> > 36 of 48 steps (75%) done
> > E: Build killed with signal TERM after 150 minutes of inactivity
From preliminary investigation, this looks to be caused by a
salmon process caught in infinite loop at the very step shown by
the hereover hardcopy. Running an strace on salmon process does
not show any system call, which does not help locating where
we're caught.
If I attach to the process with gdb, I notice the loop seems to
be in libjemalloc2, oscillating between functions from
jemalloc/internal/ph.h and jemalloc/internal/edata.h when
running step by step. Here after is a backtrace caught at a
random point in time during salmon's infinite loop:
#0 phn_prev_set (offset=<optimized out>, prev=<optimized out>,
phn=<optimized out>) at include/jemalloc/internal/ph.h:128
#1 phn_merge_ordered (cmp=<optimized out>, offset=40,
phn1=0x7f495fa17280,
phn0=0x7f495fa17280) at include/jemalloc/internal/ph.h:126
#2 phn_merge (cmp=<optimized out>, offset=40, phn1=0x7f495fa17280,
phn0=0x7f495fa17280) at include/jemalloc/internal/ph.h:146
#3 phn_merge_siblings (cmp=<optimized out>, offset=40, phn=<optimized
out>)
at include/jemalloc/internal/ph.h:209
#4 ph_merge_aux (cmp=<optimized out>, offset=40, ph=0x7f495fa03b38)
at include/jemalloc/internal/ph.h:231
#5 ph_first (cmp=<optimized out>, offset=40, ph=0x7f495fa03b38,
ph@entry=0xd4eb5f85f001) at include/jemalloc/internal/ph.h:265
#6 je_edata_heap_first (ph=ph@entry=0x7f495fa03b38) at src/edata.c:6
#7 0x00007f4961338cb9 in eset_first_fit (lg_max_fit=<optimized out>,
exact_only=<optimized out>, size=4096, eset=0x7f495fa03b18)
at src/eset.c:243
#8 je_eset_fit (eset=eset@entry=0x7f495fa03b18,
esize=esize@entry=4096,
alignment=alignment@entry=4096, exact_only=exact_only@entry=false,
lg_max_fit=<optimized out>) at src/eset.c:270
#9 0x00007f496133b958 in extent_recycle_extract
(ehooks=0x7f495fa000c0,
guarded=false, alignment=4096, size=4096, expand_edata=0x0,
ecache=0x7f495fa03aa8, pac=0x7f495fa03a70, tsdn=0x7f495fdfe968)
at src/extent.c:437
#10 extent_recycle (tsdn=tsdn@entry=0x7f495fdfe968,
pac=pac@entry=0x7f495fa03a70, ehooks=ehooks@entry=0x7f495fa000c0,
ecache=ecache@entry=0x7f495fa03aa8,
expand_edata=expand_edata@entry=0x0,
size=size@entry=4096, alignment=<optimized out>, zero=<optimized
out>,
commit=<optimized out>, growing_retained=<optimized out>,
guarded=<optimized out>) at src/extent.c:606
#11 0x00007f496133bad9 in je_ecache_alloc
(tsdn=tsdn@entry=0x7f495fdfe968,
pac=pac@entry=0x7f495fa03a70, ehooks=ehooks@entry=0x7f495fa000c0,
ecache=ecache@entry=0x7f495fa03aa8,
expand_edata=expand_edata@entry=0x0,
size=size@entry=4096, alignment=4096, zero=false, guarded=false)
at src/extent.c:87
#12 0x00007f4961346d5d in pac_alloc_real
(tsdn=tsdn@entry=0x7f495fdfe968,
pac=pac@entry=0x7f495fa03a70, ehooks=ehooks@entry=0x7f495fa000c0,
size=size@entry=4096, alignment=alignment@entry=4096,
zero=zero@entry=false, guarded=false) at src/pac.c:116
#13 0x00007f4961346ee6 in pac_alloc_impl (tsdn=0x7f495fdfe968,
self=0x7f495fa03a70, size=4096, alignment=4096, zero=<optimized
out>,
guarded=<optimized out>, frequent_reuse=true,
deferred_work_generated=0x7ffea3d0cb37) at src/pac.c:178
#14 0x00007f4961345c5c in pai_alloc
(deferred_work_generated=0x7ffea3d0cb37,
frequent_reuse=true, guarded=<optimized out>, zero=false,
alignment=<optimized out>, size=4096, self=0x7f495fa03a70,
tsdn=0x7f495fdfe968) at include/jemalloc/internal/pai.h:43
#15 je_pa_alloc (tsdn=tsdn@entry=0x7f495fdfe968,
shard=shard@entry=0x7f495fa03a58, size=4096,
alignment=alignment@entry=4096, slab=slab@entry=true,
szind=szind@entry=20, zero=false, guarded=false,
deferred_work_generated=0x7ffea3d0cb37) at src/pa.c:139
#16 0x00007f49612e7a8d in arena_slab_alloc
(tsdn=tsdn@entry=0x7f495fdfe968,
arena=arena@entry=0x7f495fa010c0, binind=binind@entry=20,
binshard=binshard@entry=0,
bin_info=bin_info@entry=0x7f4961588f20 <je_bin_infos+800>)
at src/arena.c:839
#17 0x00007f49612e8f00 in je_arena_cache_bin_fill_small (
tsdn=tsdn@entry=0x7f495fdfe968, arena=arena@entry=0x7f495fa010c0,
cache_bin=cache_bin@entry=0x7f495fdfeea8, cache_bin_info=<optimized
out>,
binind=binind@entry=20, nfill=1) at src/arena.c:1034
#18 0x00007f496135b3d9 in je_tcache_alloc_small_hard (
tsdn=tsdn@entry=0x7f495fdfe968, arena=arena@entry=0x7f495fa010c0,
tcache=tcache@entry=0x7f495fdfecc0,
cache_bin=cache_bin@entry=0x7f495fdfeea8, binind=binind@entry=20,
tcache_success=tcache_success@entry=0x7ffea3d0cc70) at
src/tcache.c:238
#19 0x00007f49612d5612 in tcache_alloc_small (slow_path=<optimized
out>,
zero=false, binind=20, size=<optimized out>, tcache=0x7f495fdfecc0,
arena=0x7f495fa010c0, tsd=0x7f495fdfe968)
at include/jemalloc/internal/tcache_inlines.h:68
#20 arena_malloc (slow_path=<optimized out>, tcache=0x7f495fdfecc0,
zero=false, ind=20, size=<optimized out>, arena=0x0,
tsdn=0x7f495fdfe968)
at include/jemalloc/internal/arena_inlines_b.h:151
#21 iallocztm (slow_path=<optimized out>, arena=0x0, is_internal=false,
tcache=0x7f495fdfecc0, zero=false, ind=20, size=<optimized out>,
tsdn=0x7f495fdfe968)
at include/jemalloc/internal/jemalloc_internal_inlines_c.h:55
#22 imalloc_no_sample (ind=20, usize=1024, size=<optimized out>,
tsd=0x7f495fdfe968, dopts=<synthetic pointer>, sopts=<synthetic
pointer>)
at src/jemalloc.c:2398
#23 imalloc_body (tsd=0x7f495fdfe968, dopts=<synthetic pointer>,
sopts=<synthetic pointer>) at src/jemalloc.c:2573
#24 imalloc (dopts=<optimized out>, sopts=<optimized out>)
at src/jemalloc.c:2687
#25 je_malloc_default (size=<optimized out>) at src/jemalloc.c:2722
#26 0x00007f4961360df9 in fallback_impl<false> (size=928)
at src/jemalloc_cpp.cpp:98
#27 0x00007f4961360ed5 in imalloc_fastpath (
fallback_alloc=0x7f4961360df0 <fallback_impl<false>(std::size_t)>,
size=<optimized out>)
at include/jemalloc/internal/jemalloc_internal_inlines_c.h:266
#28 0x000055a26c31cc0d in std::__new_allocator<std::pair<unsigned long,
bool> >::allocate (this=0x7f495f8945a8, __n=<optimized out>)
at /usr/include/c++/12/bits/new_allocator.h:112
#29 std::allocator_traits<std::allocator<std::pair<unsigned long, bool>
> >::allocate (__n=<optimized out>, __a=...)
at /usr/include/c++/12/bits/alloc_traits.h:464
#30 std::_Vector_base<std::pair<unsigned long, bool>,
std::allocator<std::pair<unsigned long, bool> > >::_M_allocate (__n=<optimized
out>,
this=<optimized out>) at /usr/include/c++/12/bits/stl_vector.h:378
#31 std::vector<std::pair<unsigned long, bool>,
std::allocator<std::pair<unsigned long, bool> > >::_M_default_append
(this=0x7f495f8945a8, __n=58)
at /usr/include/c++/12/bits/vector.tcc:650
#32 0x000055a26c30340c in std::vector<std::pair<unsigned long, bool>,
std::allocator<std::pair<unsigned long, bool> > >::resize
(__new_size=<optimized out>,
this=<optimized out>) at /usr/include/c++/12/bits/stl_vector.h:1011
#33 pufferfish::BinaryGFAReader::parseFile (this=0x7ffea3d0e340)
at ./external/pufferfish/src/PufferfishBinaryGFAReader.cpp:161
#34 0x000055a26c306d10 in pufferfishIndex (indexOpts=...)
at ./external/pufferfish/src/PufferfishIndexer.cpp:541
#35 0x000055a26bee06ae in SalmonIndex::buildPuffIndex_ (idxOpt=...,
indexDir=..., this=0x7f495f83e280) at ./include/SalmonIndex.hpp:111
#36 SalmonIndex::build (idxOpt=..., indexDir=..., this=0x7f495f83e280)
at ./include/SalmonIndex.hpp:76
#37 salmonIndex (argc=<optimized out>, argv=<optimized out>)
at ./src/BuildSalmonIndex.cpp:247
#38 0x000055a26be7b510 in std::function<int (int, char const**,
std::unique_ptr<SalmonIndex, std::default_delete<SalmonIndex>
>&)>::operator()(int, char const**, std::unique_ptr<SalmonIndex,
std::default_delete<SalmonIndex> >&) const (
__args#2=std::unique_ptr<SalmonIndex> = {...}, __args#1=<optimized
out>,
__args#0=<optimized out>, this=0x7f495f84e338)
at /usr/include/c++/12/bits/std_function.h:591
#39 main (argc=<optimized out>, argv=0x7ffea3d0fea8) at
./src/Salmon.cpp:267
Here is an almost minimal repreducer, using the pigx-rnaseq test
data raising the issue (that's 6.1M of data, so not exactly
minimal neither):
wget
'https://salsa.debian.org/med-team/pigx-rnaseq/-/raw/debian/0.1.0-1.1/tests/sample_data/sample.cdna.fasta'
salmon index -t sample.cdna.fasta -i salmon_index -p 8
Interestingly my "reproducer" is flaky and tends to raise
segmentation faults on occasions, instead of getting caught into
infinite loop. Actually I just witnessed salmon failing to
build from source with such an error. Other test data may show
consistent crash, or the indexing go through completely.
The issue could very well be a jemalloc bug, although other
reverse dependencies do not seem to be affected by similar
issues, but maybe this is something to keep in mind. The issue
would need to be reduced further to identify the root cause
exactly.
Have a nice day, :)
--
Étienne Mollier <[email protected]>
Fingerprint: 8f91 b227 c7d6 f2b1 948c 8236 793c f67e 8f0d 11da
Sent from /dev/pts/1, please excuse my verbosity.
On air: Big Big Train - London Plane
signature.asc
Description: PGP signature

