Attached are two patches against the Debian-packaged perl
5.28.1-6+deb10u1 which I used for tracking down the attempt to
access @DB::args of PublicInbox::Listener::event_step as the
source of the segfault.
I don't know Perl internals very well, and I was never an
advanced gdb user when I hacked C.
While most segfaults I saw did not emit any errors, maybe 1-2 of
them pointed out something from Carp (the error reporting and
backtrace module of Perl); so I wasn't sure if there were
multiple sources of segfaults.
I initially thought it was something warning during the
destruction phase because most of the code isn't doing anything
out-of-the-ordinary aside from short-lived workers.
I then ran lei with debugperl(1) (deb: perl-debug) instead of
the normal "perl" on my system. That got me to the assertion
failure in sv.c::Perl_sv_clear at
assert(SvTYPE(sv) != (svtype)SVTYPEMASK);
So I made the attached patch to sv.c to call Perl_op_dump (which
I learned about from the perlhacktips(1) manpage).
This patch to sv.c got me a dump which put me around lines 340-350
of Carp.pm. Poking into Carp.pm brought me to this comment
around lines 330:
# Guard our serialization of the stack from stack refcounting bugs
# NOTE this is NOT a complete solution, we cannot 100% guard against
# these bugs. However in many cases Perl *is* capable of detecting
# them and throws an error when it does. Unfortunately serializing
# the arguments on the stack is a perfect way of finding these bugs,
# even when they would not affect normal program flow that did not
# poke around inside the stack. Inside of Carp.pm it makes little
# sense reporting these bugs, as Carp's job is to report the callers
# errors, not the ones it might happen to tickle while doing so.
# See: https://rt.perl.org/Public/Bug/Display.html?id=131046
# and: https://rt.perl.org/Public/Bug/Display.html?id=52610
# for more details and discussion. - Yves
The conundrum is Carp itself is generating backtraces, so
getting a Perl-level backtrace becomes tricky...
I decided to edit Carp.pm to store the caller-supplied error
message in a global variable, and set the G_ERR environment
variable before where it would occasionally segfault.
Once the environment variable is set, printing fields of the
global **environ array from gdb on the core dump where G_ERR is
stored would get me the $sub_name and offending error message.
It turned out the error was the confess call in
PublicInbox::Eml::body_str which was copied over from
Email::MIME, and $sub_name was PublicInbox::Listener::event_step.
That led me to realize $DescriptorMap{$fd} was no longer
referenced clobbered and to this workaround.
I didn't want to spend time compiling, relinking and install
Perl again (and didn't want to do it the first time), but I
could've printed the G_ERR environment where I was doing
Perl_op_dump, too.
>From 03712da7eb875b18879662079279241cd4fee2d0 Mon Sep 17 00:00:00 2001
From: Eric Wong <[email protected]>
Date: Mon, 1 Feb 2021 05:42:42 +0000
Subject: [PATCH 1/2] sv.c: run Perl_op_dump before failing assertion
The debugperl(1) binary packaged by Debian alerted me
to an assertion in Perl_sv_clear failing at:
assert(SvTYPE(sv) != (svtype)SVTYPEMASK);
To get more information about what Perl code is executing,
we can run Perl_op_dump on PL_op before triggering the
assertion.
---
sv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/sv.c b/sv.c
index 07865bb..2c950e0 100644
--- a/sv.c
+++ b/sv.c
@@ -6510,6 +6510,9 @@ Perl_sv_clear(pTHX_ SV *const orig_sv)
type = SvTYPE(sv);
assert(SvREFCNT(sv) == 0);
+ if ((SvTYPE(sv) == (svtype)SVTYPEMASK)) {
+ Perl_op_dump(aTHX_ PL_op);
+ }
assert(SvTYPE(sv) != (svtype)SVTYPEMASK);
if (type <= SVt_IV) {
>From ef7a02def4468ac7301c120ece586caea7351c4d Mon Sep 17 00:00:00 2001
From: Eric Wong <[email protected]>
Date: Mon, 1 Feb 2021 05:34:12 +0000
Subject: [PATCH 2/2] carp: set G_ERR in environ before accessing @DB::args
In case accessing @DB::args causes a segfault and core dump;
one can open the core dump with gdb(1) run "p environ[$IDX]"
to figure out what message and sub_name is tickling Perl5's
stack-not-refcounted behavior and causing a segfault.
($IDX is typically the last environment variable index in the
environ(7) array, but it could be another number)
---
dist/Carp/lib/Carp.pm | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/dist/Carp/lib/Carp.pm b/dist/Carp/lib/Carp.pm
index 109b7fe..c87e373 100644
--- a/dist/Carp/lib/Carp.pm
+++ b/dist/Carp/lib/Carp.pm
@@ -302,6 +302,8 @@ BEGIN {
}
}
+our $g_err;
+
sub caller_info {
my $i = shift(@_) + 1;
my %call_info;
@@ -327,6 +329,7 @@ sub caller_info {
my $sub_name = Carp::get_subname( \%call_info );
if ( $call_info{has_args} ) {
+$ENV{G_ERR} = "$sub_name $g_err";
# Guard our serialization of the stack from stack refcounting bugs
# NOTE this is NOT a complete solution, we cannot 100% guard against
# these bugs. However in many cases Perl *is* capable of detecting
@@ -594,6 +597,7 @@ sub ret_backtrace {
$tid_msg = " thread $tid" if $tid;
}
+$g_err = $err;
my %i = caller_info($i);
$mess = "$err at $i{file} line $i{line}$tid_msg";
if( $. ) {
@@ -635,6 +639,7 @@ sub ret_summary {
my $tid = threads->tid;
$tid_msg = " thread $tid" if $tid;
}
+$g_err = $err;
my %i = caller_info($i);
return "$err at $i{file} line $i{line}$tid_msg\.\n";