I have confirmed that the original code segfaults when compiled with gcc 11.2 on my Debian instance, and that it runs to completion after the latest patch. I have also confirmed that, after the latest patch, all other tests pass.
On Fri, Feb 18, 2022 at 1:27 PM Chris Chang <chrchang...@gmail.com> wrote: > I am installing gcc-11 on my Debian instance now, and will be running more > extensive tests today searching for other things that may have stopped > working for the same reason. > > On Fri, Feb 18, 2022 at 1:22 PM Andreas Tille <andr...@an3as.eu> wrote: > >> I confirm its gcc-11. I'll check tomorrow. Thanks a lot for your quick >> and helpful responses, Andreas. >> >> Am Fri, Feb 18, 2022 at 12:53:58PM -0800 schrieb Chris Chang: >> > I have posted an update under the provisional assumption that it's gcc >> 11's >> > new ipa-modref pass that is causing this code to fail, since it does >> seem >> > to break some similar code. >> > >> > On Fri, Feb 18, 2022 at 11:49 AM Chris Chang <chrchang...@gmail.com> >> wrote: >> > >> > > What compiler version are you using? This implies that the pgl_malloc >> > > inline function is not being compiled to the expected code; there is >> an >> > > existing non-inlined version that is used for very old gcc versions, >> but it >> > > looks like it may also be needed here. >> > > >> > > On Fri, Feb 18, 2022 at 11:40 AM Andreas Tille <andr...@an3as.eu> >> wrote: >> > > >> > >> Hi again, >> > >> >> > >> I applied this patch and now I get: >> > >> >> > >> (gdb) run >> > >> Starting program: /usr/lib/plink2/plink2-sse2 --debug --pfile >> tmp_data >> > >> --export vcf vcf-dosage=DS --out tmp_data2 >> > >> [Thread debugging using libthread_db enabled] >> > >> Using host libthread_db library >> "/lib/x86_64-linux-gnu/libthread_db.so.1". >> > >> [New Thread 0x7ffff4cc7640 (LWP 4060797)] >> > >> [New Thread 0x7fffec4c6640 (LWP 4060798)] >> > >> [New Thread 0x7fffebcc5640 (LWP 4060799)] >> > >> PLINK v2.00a3 64-bit (29 Jan 2022) >> > >> www.cog-genomics.org/plink/2.0/ >> > >> (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public >> > >> License v3 >> > >> Logging to tmp_data2.log. >> > >> Options in effect: >> > >> --debug >> > >> --export vcf vcf-dosage=DS >> > >> --out tmp_data2 >> > >> --pfile tmp_data >> > >> >> > >> Start time: Fri Feb 18 19:06:45 2022 >> > >> 31998 MiB RAM detected; reserving 15999 MiB for main workspace. >> > >> Using up to 4 compute threads. >> > >> [New Thread 0x7ffff7fc5640 (LWP 4060800)] >> > >> sizeof(PhenoCol): 40 pheno_cols: 0 >> > >> --debug: setting pheno_cols[0].nonmiss. = nullptr >> > >> >> > >> Thread 1 "plink2-sse2" received signal SIGSEGV, Segmentation fault. >> > >> 0x00005555556fb82e in plink2::LoadPsam (psamname=psamname@entry >> =0x7fffffffbe70 >> > >> "tmp_data.psam", pheno_range_list_ptr=<optimized out>, fam_cols=..., >> > >> pheno_ct_max=<optimized out>, >> > >> missing_pheno=<optimized out>, affection_01=0, max_thread_ct=4, >> > >> piip=0x7fffffff8880, sample_include_ptr=0x7fffffff8790, >> > >> founder_info_ptr=0x7fffffff87a8, sex_nm_ptr=0x7fffffff8798, >> > >> sex_male_ptr=0x7fffffff87a0, pheno_cols_ptr=0x7fffffff8770, >> > >> pheno_names_ptr=0x7fffffff8780, raw_sample_ct_ptr=0x7fffffff8728, >> > >> pheno_ct_ptr=0x7fffffff8720, >> > >> max_pheno_name_blen_ptr=0x7fffffff87b0) at ../plink2_psam.cc:615 >> > >> warning: Source file is more recent than executable. >> > >> 615 pheno_cols[pheno_idx].nonmiss = nullptr; >> > >> >> > >> Kind regards >> > >> >> > >> Andreas. >> > >> >> > >> Am Fri, Feb 18, 2022 at 08:45:12AM -0800 schrieb Chris Chang: >> > >> > Ok, I don't know why that particular line would fail, but I've >> added >> > >> > another debug-print before it on GitHub. >> > >> > >> > >> > On Fri, Feb 18, 2022 at 4:24 AM Andreas Tille < >> andr...@fam-tille.de> >> > >> wrote: >> > >> > >> > >> > > Hi Chris, >> > >> > > >> > >> > > Am Thu, Feb 17, 2022 at 07:13:49PM -0800 schrieb Chris Chang: >> > >> > > > I was unable to replicate this issue on a Debian EC2 instance. >> > >> However, >> > >> > > > there are very few things that happen between printing "End >> time:" >> > >> and >> > >> > > > program exit, so I have added a bunch of debug-prints (active >> when >> > >> the >> > >> > > > --debug flag is passed in) to the latest GitHub commit that >> should >> > >> reveal >> > >> > > > which of those few things is triggering the segfault; let me >> know >> > >> if you >> > >> > > > are able to run this build. >> > >> > > >> > >> > > I think the issue is a bit more complex. Debian provides a >> wrapper >> > >> > > which calls the best / most performant plink2. The issue seems >> to >> > >> > > occure for SFX=avx. First I do: >> > >> > > >> > >> > > >> > >> > > /usr/lib/plink2/plink2-avx --debug --dummy 33 65537 0.1 >> > >> dosage-freq=0.1 >> > >> > > --out tmp_data >> > >> > > >> > >> > > This works. In the next step I fire up gdb then which results in >> > >> > > >> > >> > > >> > >> > > (gdb) run >> > >> > > Starting program: /usr/lib/plink2/plink2-avx --debug --pfile >> tmp_data >> > >> > > --export vcf vcf-dosage=DS --out tmp_data2 >> > >> > > [Thread debugging using libthread_db enabled] >> > >> > > Using host libthread_db library >> > >> "/lib/x86_64-linux-gnu/libthread_db.so.1". >> > >> > > [New Thread 0x7ffff4cc7640 (LWP 2931408)] >> > >> > > [New Thread 0x7ffff44c6640 (LWP 2931409)] >> > >> > > [New Thread 0x7fffebcc5640 (LWP 2931411)] >> > >> > > PLINK v2.00a3 SSE4.2 (29 Jan 2022) >> > >> > > www.cog-genomics.org/plink/2.0/ >> > >> > > (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General >> Public >> > >> > > License v3 >> > >> > > Logging to tmp_data2.log. >> > >> > > Options in effect: >> > >> > > --debug >> > >> > > --export vcf vcf-dosage=DS >> > >> > > --out tmp_data2 >> > >> > > --pfile tmp_data >> > >> > > >> > >> > > Start time: Fri Feb 18 11:58:49 2022 >> > >> > > 31998 MiB RAM detected; reserving 15999 MiB for main workspace. >> > >> > > Using up to 4 compute threads. >> > >> > > [New Thread 0x7ffff7fc5640 (LWP 2931412)] >> > >> > > >> > >> > > Thread 1 "plink2-avx" received signal SIGSEGV, Segmentation >> fault. >> > >> > > plink2::LoadPsam (psamname=psamname@entry=0x7fffffffbe70 >> > >> "tmp_data.psam", >> > >> > > pheno_range_list_ptr=<optimized out>, fam_cols=..., >> > >> pheno_ct_max=<optimized >> > >> > > out>, >> > >> > > missing_pheno=<optimized out>, affection_01=0, >> max_thread_ct=4, >> > >> > > piip=0x7fffffff8880, sample_include_ptr=0x7fffffff87a0, >> > >> > > founder_info_ptr=0x7fffffff87b8, sex_nm_ptr=0x7fffffff87a8, >> > >> > > sex_male_ptr=0x7fffffff87b0, pheno_cols_ptr=0x7fffffff8780, >> > >> > > pheno_names_ptr=0x7fffffff8790, raw_sample_ct_ptr=0x7fffffff8738, >> > >> > > pheno_ct_ptr=0x7fffffff8730, >> > >> > > max_pheno_name_blen_ptr=0x7fffffff87c0) at >> ../plink2_psam.cc:611 >> > >> > > warning: Source file is more recent than executable. >> > >> > > 611 pheno_cols[pheno_idx].nonmiss = nullptr; >> > >> > > >> > >> > > >> > >> > > I also added some more debug lines in a patch[1]. >> > >> > > >> > >> > > It seems that there is actually the weak part of the code since >> the >> > >> > > output turns to >> > >> > > >> > >> > > ... >> > >> > > Start time: Fri Feb 18 13:19:13 2022 >> > >> > > 31998 MiB RAM detected; reserving 15999 MiB for main workspace. >> > >> > > Using up to 4 compute threads. >> > >> > > [New Thread 0x7ffff7fc5640 (LWP 3957711)] >> > >> > > --debug: setting pheno_cols[0].nonmiss. = nullptr >> > >> > > >> > >> > > Thread 1 "plink2-sse2" received signal SIGSEGV, Segmentation >> fault. >> > >> > > 0x00005555556fb6ff in plink2::LoadPsam (psamname=psamname@entry >> > >> =0x7fffffffbe70 >> > >> > > "tmp_data.psam", pheno_range_list_ptr=<optimized out>, >> fam_cols=..., >> > >> > > pheno_ct_max=<optimized out>, >> > >> > > missing_pheno=<optimized out>, affection_01=0, >> max_thread_ct=4, >> > >> > > piip=0x7fffffff8880, sample_include_ptr=0x7fffffff87a0, >> > >> > > founder_info_ptr=0x7fffffff87b8, sex_nm_ptr=0x7fffffff87a8, >> > >> > > sex_male_ptr=0x7fffffff87b0, pheno_cols_ptr=0x7fffffff8780, >> > >> > > pheno_names_ptr=0x7fffffff8790, raw_sample_ct_ptr=0x7fffffff8738, >> > >> > > pheno_ct_ptr=0x7fffffff8730, >> > >> > > max_pheno_name_blen_ptr=0x7fffffff87c0) at >> ../plink2_psam.cc:614 >> > >> > > warning: Source file is more recent than executable. >> > >> > > 614 pheno_cols[pheno_idx].nonmiss = nullptr; >> > >> > > >> > >> > > >> > >> > > I hope this might help a bit to track down the issue >> > >> > > >> > >> > > Andreas. >> > >> > > >> > >> > > >> > >> > > >> > >> > > [1] >> > >> > > >> > >> >> https://salsa.debian.org/med-team/plink2/-/blob/master/debian/patches/debug2.patch >> > >> > > >> > >> > > -- >> > >> > > http://fam-tille.de >> > >> > > >> > >> >> > >> -- >> > >> http://fam-tille.de >> > >> >> > > >> >> -- >> http://fam-tille.de >> >