Re: DBD::Oracle forked client hangs with Oracle Client Libraries 19.20 and later.

2023-10-28 Thread Justin Schoeman
To follow up, with the hackaround in the previous post, and modify 
test.pl to dispatch batches of 20 children, it eventually dies with a 
segfault (below - coredump available on request). Although I can not be 
sure if this is related to my hack, or the core library.


Thanks,

Justin

   Message: Process 1141691 (test.pl) of user 1000 dumped core.

    Module libnss_resolve.so.2 from rpm 
systemd-253.10-1.fc38.x86_64
    Module libnss_mdns4_minimal.so.2 from rpm 
nss-mdns-0.15.1-8.fc38.x86_64

    Module libcap.so.2 from rpm libcap-2.48-6.fc38.x86_64
    Module libnss_myhostname.so.2 from rpm 
systemd-253.10-1.fc38.x86_64

    Module libnuma.so.1 from rpm numactl-2.0.16-2.fc38.x86_64
    Module libaio.so.1 from rpm libaio-0.3.111-15.fc38.x86_64
    Module libcrypt.so.2 from rpm 
libxcrypt-4.4.36-1.fc38.x86_64

    Stack trace of thread 1141697:
    #0  0x7f8bfe0b0884 __pthread_kill_implementation 
(libc.so.6 + 0x8e884)

    #1  0x7f8bfe05fafe raise (libc.so.6 + 0x3dafe)
    #2  0x7f8beeccb73f skgesigOSCrash 
(libclntsh.so.21.1 + 0x28cb73f)
    #3  0x7f8bef3ca9ed kpeDbgSignalHandler 
(libclntsh.so.21.1 + 0x2fca9ed)
    #4  0x7f8beeccba52 skgesig_sigactionHandler 
(libclntsh.so.21.1 + 0x28cba52)

    #5  0x7f8bfe05fbb0 __restore_rt (libc.so.6 + 0x3dbb0)
    #6  0x7f8bfe302e70 Perl_csighandler3 
(libperl.so.5.36 + 0x102e70)

    #7  0x7f8bfe05fbb0 __restore_rt (libc.so.6 + 0x3dbb0)
    #8  0x7f8bfe0ab219 __futex_abstimed_wait_common 
(libc.so.6 + 0x89219)
    #9  0x7f8bfe0adf22 
pthread_cond_timedwait@@GLIBC_2.3.2 (libc.so.6 + 0x8bf22)
    #10 0x7f8beb6abfc3 sltspctimewait 
(libclntshcore.so.21.1 + 0xabfc3)
    #11 0x7f8bec8cf501 kpucpincrtime (libclntsh.so.21.1 
+ 0x4cf501)

    #12 0x7f8bfe0ae947 start_thread (libc.so.6 + 0x8c947)
    #13 0x7f8bfe134860 __clone3 (libc.so.6 + 0x112860)

    Stack trace of thread 1141691:
    #0  0x7f8bfe1230ea read (libc.so.6 + 0x1010ea)
    #1  0x7f8bef8d02d0 snttread (libclntsh.so.21.1 + 
0x34d02d0)
    #2  0x7f8bef8ceb35 nttfprd (libclntsh.so.21.1 + 
0x34ceb35)
    #3  0x7f8bef8c3c95 nsbasic_brc (libclntsh.so.21.1 + 
0x34c3c95)
    #4  0x7f8bef8b7e7f nioqrc (libclntsh.so.21.1 + 
0x34b7e7f)
    #5  0x7f8bef8f7ecf ttcdrv (libclntsh.so.21.1 + 
0x34f7ecf)
    #6  0x7f8bef8bdccd nioqwa (libclntsh.so.21.1 + 
0x34bdccd)
    #7  0x7f8bef88b690 upirtrc (libclntsh.so.21.1 + 
0x348b690)
    #8  0x7f8bef8a0331 kpurcsc (libclntsh.so.21.1 + 
0x34a0331)
    #9  0x7f8bef890f40 kpuexec (libclntsh.so.21.1 + 
0x3490f40)
    #10 0x7f8bef88ad49 OCIStmtExecute 
(libclntsh.so.21.1 + 0x348ad49)

    #11 0x7f8bfe5e2377 ora_st_execute (Oracle.so + 0x24377)
    #12 0x7f8bfe5cad7d XS_DBD__Oracle__st_execute 
(Oracle.so + 0xcd7d)

    #13 0x7f8bfe61b106 XS_DBI_dispatch (DBI.so + 0xf106)
    #14 0x7f8bfe3265aa Perl_pp_entersub 
(libperl.so.5.36 + 0x1265aa)
    #15 0x7f8bfe317958 Perl_runops_standard 
(libperl.so.5.36 + 0x117958)
    #16 0x7f8bfe27c3e7 Perl_call_sv (libperl.so.5.36 + 
0x7c3e7)

    #17 0x7f8bfe61b164 XS_DBI_dispatch (DBI.so + 0xf164)
    #18 0x7f8bfe3265aa Perl_pp_entersub 
(libperl.so.5.36 + 0x1265aa)
    #19 0x7f8bfe317958 Perl_runops_standard 
(libperl.so.5.36 + 0x117958)

    #20 0x7f8bfe28259d perl_run (libperl.so.5.36 + 0x8259d)
    #21 0x56551ca1734a main (perl + 0x134a)
    #22 0x7f8bfe049b8a __libc_start_call_main 
(libc.so.6 + 0x27b8a)
    #23 0x7f8bfe049c4b __libc_start_main@@GLIBC_2.34 
(libc.so.6 + 0x27c4b)

    #24 0x56551ca17385 _start (perl + 0x1385)
    ELF object binary architecture: AMD x86-64

On 2023/10/25 15:39, Justin Schoeman wrote:

Good day,

I am trying to dig into an issue after a recent Oracle update. The 
application has been running flawlessly from version 9.x to 18.5, and 
has only broken on the most recent client library updates.


Attached is a cut down application which demonstrates the issue.

The forked child processes never exit. Attaching to them with GDB gives:

(gdb) t a a bt

Thread 1 (Thread 0x7f0c48a62c40 (LWP 773676) "test.pl"):
#0  futex_wait (private=, expected=14, 
futex_word=0x55c5ef1c96e4) at ../sysdeps/nptl/futex-internal.h:146
#1  futex_wait_simple (private=, expected=14, 
futex_word=0x55c5ef1c96e4) at ../sysdeps/nptl/futex-internal.h:177
#2  __GI___pthread_cond_destroy (cond=0x55c5ef1c96c0) at 
pth

DBD::Oracle forked client hangs with Oracle Client Libraries 19.20 and later.

2023-10-25 Thread Justin Schoeman

Good day,

I am trying to dig into an issue after a recent Oracle update. The 
application has been running flawlessly from version 9.x to 18.5, and 
has only broken on the most recent client library updates.


Attached is a cut down application which demonstrates the issue.

The forked child processes never exit. Attaching to them with GDB gives:

(gdb) t a a bt

Thread 1 (Thread 0x7f0c48a62c40 (LWP 773676) "test.pl"):
#0  futex_wait (private=, expected=14, 
futex_word=0x55c5ef1c96e4) at ../sysdeps/nptl/futex-internal.h:146
#1  futex_wait_simple (private=, expected=14, 
futex_word=0x55c5ef1c96e4) at ../sysdeps/nptl/futex-internal.h:177
#2  __GI___pthread_cond_destroy (cond=0x55c5ef1c96c0) at 
pthread_cond_destroy.c:53
#3  0x7f0c35cac072 in sltspcdestroy () from 
/usr/lib/oracle/21/client64/lib/libclntshcore.so.21.1
#4  0x7f0c36ecf68e in kpucpstopthr () from 
/usr/lib/oracle/21/client64/lib/libclntsh.so.21.1
#5  0x7f0c39e9a42e in kpufhndl0 () from 
/usr/lib/oracle/21/client64/lib/libclntsh.so.21.1
#6  0x7f0c48a1028a in ora_db_destroy (dbh=dbh@entry=0x55c5ef32c5b0, 
imp_dbh=imp_dbh@entry=0x55c5ef035390) at 
/root/.local/share/.cpan/build/DBD-Oracle-1.83-2/dbdimp.c:1263
#7  0x7f0c48a10752 in XS_DBD__Oracle__db_DESTROY 
(my_perl=0x55c5eee272a0, cv=) at ./Oracle.xsi:432
#8  0x7f0c48a4d106 in XS_DBI_dispatch (my_perl=0x55c5eee272a0, 
cv=0x55c5ef13b9c8) at 
/usr/src/debug/perl-DBI-1.643-15.fc38.x86_64/DBI.xs:3783
#9  0x7f0c487265aa in Perl_pp_entersub (my_perl=0x55c5eee272a0) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/pp_hot.c:5353
#10 0x7f0c4867c7a6 in Perl_call_sv (my_perl=, 
sv=, flags=) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/perl.c:3057
#11 0x7f0c4872c643 in S_curse (my_perl=my_perl@entry=0x55c5eee272a0, 
sv=sv@entry=0x55c5ef34af38, check_refcnt=check_refcnt@entry=true) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:6973
#12 0x7f0c4872cd88 in Perl_sv_clear 
(my_perl=my_perl@entry=0x55c5eee272a0, 
orig_sv=orig_sv@entry=0x55c5ef079858) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:6536
#13 0x7f0c4872d3a2 in Perl_sv_free2 (my_perl=0x55c5eee272a0, 
sv=0x55c5ef079858, rc=) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:7073
#14 0x7f0c486fc878 in Perl_SvREFCNT_dec (sv=, 
my_perl=0x55c5eee272a0) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/inline.h:405
#15 S_mg_free_struct (my_perl=0x55c5eee272a0, sv=, 
mg=0x55c5ef34bb70) at /usr/src/debug/perl-5.36.1-497.fc38.x86_64/mg.c:563
#16 0x7f0c486fc8c9 in Perl_mg_free (my_perl=0x55c5eee272a0, 
sv=0x55c5ef34aec0) at /usr/src/debug/perl-5.36.1-497.fc38.x86_64/mg.c:585
#17 0x7f0c4872cdc3 in Perl_sv_clear 
(my_perl=my_perl@entry=0x55c5eee272a0, 
orig_sv=orig_sv@entry=0x55c5ef34aec0) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:6544
#18 0x7f0c4872d3a2 in Perl_sv_free2 (my_perl=0x55c5eee272a0, 
sv=0x55c5ef34aec0, rc=) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:7073
#19 0x7f0c48722646 in Perl_SvREFCNT_dec_NN (sv=, 
my_perl=) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/inline.h:419
#20 do_clean_objs (ref=0x55c5ef338ce0, my_perl=) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:448
#21 S_visit (mask=2048, flags=2048, f=, 
my_perl=0x55c5eee272a0) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:391
#22 Perl_sv_clean_objs (my_perl=0x55c5eee272a0) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/sv.c:542
#23 0x7f0c48680160 in perl_destruct (my_perl=0x55c5eee272a0) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/perl.c:880
#24 0x55c5eda0f324 in main (argc=, argv=out>, env=) at 
/usr/src/debug/perl-5.36.1-497.fc38.x86_64/perlmain.c:121


I have instrumented dbdimpl.c as follows:

    /* free global environment handle during 
destruction of last connection */
    } else if ( (imp_dbh->envhp == imp_drh->envhp) 
&& (SvTRUE(perl_get_sv("DBI::PERL_ENDING",0))) ) {

    PerlIO_printf(PerlIO_stderr(), "EXIT1\n");
    OCIHandleFree_log_stat(imp_dbh, 
imp_dbh->envhp, OCI_HTYPE_ENV, status);

    PerlIO_printf(PerlIO_stderr(), "EXIT2\n");
    if ( status == OCI_SUCCESS ) {
    PerlIO_printf(PerlIO_stderr(), "EXIT3\n");
    imp_dbh->envhp = NULL;
    imp_drh->envhp = NULL;
    }
    }

And the child prints 'EXIT1', but never 'EXIT2'. If I comment out the 
call to OCIHandleFree_log_stat() then the child process exits without 
error (although there are random segfaults after long run times).


1. A complete log of all steps of the build, e.g.:

Atatched (build.log)

2. Full details of which version of Oracle client and server you're using

# rpm -qa | grep oracle
oracle-instantclient-basic-21.11.0.0.0-1.x86_64
oracle-instantclient-devel-21.11.0.0.0-1.x86_64


3. The output of perl