Package: libopenblas0-pthread
Version: 0.3.21+ds-2
Severity: important
X-Debbugs-Cc: ermilovdimi...@gmail.com

Architecture: riscv64

Reproducer:
Unit tests of scipy library (based on openblas) produces seg faults:
python <env>/bin/pytest
riscv_env_debug/lib/python3.9/site-packages/scipy/integrate/tests/test_quadrature.py
-k test_scalar

Call stack in gdb:
#0  _Py_CheckFunctionResult (tstate=0xaaaaaaaad7b2d0,
callable=0xffffffe9213ea0, result=0xffffffff, where=0x0) at
Objects/call.c:58
#1  0x00aaaaaaaab0d294 in _PyObject_VectorcallTstate
(kwnames=0xffffffe8fa4dc0, nargsf=9223372036854775809,
args=0xaaaaaaab387fe8,
    callable=0xffffffe9213ea0, tstate=<optimized out>) at
./Include/cpython/abstract.h:116
#2  _PyObject_VectorcallTstate (kwnames=0xffffffe8fa4dc0,
nargsf=9223372036854775809, args=0xaaaaaaab387fe8,
callable=0xffffffe9213ea0,
    tstate=<optimized out>) at ./Include/cpython/abstract.h:103
#3  PyObject_Vectorcall (kwnames=0xffffffe8fa4dc0,
nargsf=9223372036854775809, args=0xaaaaaaab387fe8,
callable=0xffffffe9213ea0)
    at ./Include/cpython/abstract.h:127
#4  call_function (kwnames=0xffffffe8fa4dc0, oparg=<optimized out>,
pp_stack=<synthetic pointer>, tstate=0xaaaaaaaad7b2d0) at
Python/ceval.c:5077
#5  _PyEval_EvalFrameDefault (tstate=0xaaaaaaaad7b2d0, f=0xaaaaaaab387dc0,
throwflag=<optimized out>) at Python/ceval.c:3537
#6  0x00aaaaaaaab8daa2 in _PyEval_EvalFrame (throwflag=0,
f=0xaaaaaaab387dc0, tstate=0xaaaaaaaad7b2d0) at
./Include/internal/pycore_ceval.h:40
#7  _PyEval_EvalCode (tstate=0xaaaaaaaad7b2d0, _co=0xffffffe8fa52f0,
globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>,
    argcount=1, kwnames=<optimized out>, kwargs=0xffffffe7eb91e8,
kwcount=6, kwstep=kwstep@entry=1, defs=0xffffffe8f9ee98, defcount=7,
kwdefs=0x0,
    closure=0x0, name=0xffffffe8fa2030, qualname=0xffffffe8fa2030) at
Python/ceval.c:4329
#8  0x00aaaaaaaab17422 in _PyFunction_Vectorcall (func=<optimized out>,
stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:396

The call stack doesn't point directly to openblas but further debugging
revealed an issue in the openblas package. I'm skipping debug steps and
intermediate findings (can share if needed) and posting final results.

Root cause:
See
https://salsa.debian.org/science-team/openblas/-/blob/master/debian/patches/no-embedded-lapack.patch#L28

+netlib :
+ifeq (,$(findstring 64,$(LIBNAME)))
+ mkdir lapack-netlib
+ cd lapack-netlib && ar -x /usr/lib/$$(dpkg-architecture -q
DEB_HOST_MULTIARCH)/liblapack_pic.a
+ make -C interface delete-duplicate-lapack-objects
+ ar -ru $(LIBNAME) `LC_ALL=C ls lapack-netlib/*`
 else
-re_lapack :
- @$(MAKE) -C relapack
+ mkdir lapack64-netlib
+ cd lapack64-netlib && ar -x /usr/lib/$$(dpkg-architecture -q
DEB_HOST_MULTIARCH)/liblapack64_pic.a
+ make -C interface delete-duplicate-lapack-objects
+ ar -ru $(LIBNAME) `LC_ALL=C ls lapack64-netlib/*`
 endif

Depending on the build type we use liblapack either from liblapack-dev or
liblapack64-dev package. It's fine. However according to build logs we
never use liblapack_pic.a, and always use liblapack64_pic.a instead:
see
https://buildd.debian.org/status/fetch.php?pkg=openblas&arch=riscv64&ver=0.3.21%2Bds-2&stamp=1663168927&raw=0
and grep by "ar -x". "ifeq (,$(findstring 64,$(LIBNAME)))" is buggy because
LIBNAME looks this way: either libopenblas_riscv64_genericp-r0.3.21.a or
libopenblas64_riscv64_genericp-r0.3.21.a. In other words it always has a
"64" string. So we're using the ILP64 version of liblapack when building
the LP64 version of libopenblas. At run-time the LP64 version of numpy
numpy loads the LP64 version of openblas which has ILP64 version of fortran
obj files from the liblapack package.

In other words, LP64 openblas packages for RISCV64 are fully broken. Not
sure what the proper fix should be: to rename
libopenblas_riscv64_genericp/libopenblas64_riscv64_genericp to something
different or to fix somehow at no-embedded-lapack.patch level. If you
advise, I can try to contribute a patch.

--
Regards,
Dmitry

Reply via email to