Bug#1064810: Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-03 Thread Alastair McKinstry


On 02/04/2024 21:29, Sebastian Ramacher wrote:

To be honest, I don't see these two changes (changing mpi-defaults to
mpich on 32 bit; breaking 32 bit build of openmpi) to be ready. It'd be
preferable to reinstate a 32-bit compatible pmix and fix openmpi on 32
bit until the time_t transition is done.

Cheers


It looks like libpmix-dev is only used by mpich, openmpi and slurm-wlm.

mpich will be configured not to use pmix on 32-bit systems anyway

slurm-wlm builds ok without pmix; it can be patched to use pmix only on 
64-bit systems.


openmpi in sid (4.1.6-7) has an internal copy of pmix 4.1.2 that it can 
be configured to use.


I can prepare this for openmpi on the debian/trixie branch; to upload 
with a fix for #1067055,


regards

Alastair

--
Alastair McKinstry,
GPG: 82383CE9165B347C787081A2CBE6BB4E5D9AD3A5
ph: +353 87 6847928 e:alast...@mckinstry.ie, im: @alastair:mckinstry.ie


Bug#1064810: Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-03 Thread Alastair McKinstry



On 02/04/2024 21:29, Sebastian Ramacher wrote:



OpenMPI 5 drops 32-bit support, but otherwise does not change the API/ABI.
So it is technically not a transition, but breaks 32-bit builds.

Doesn't make it better. This is not the time to do that without tests
builds and bugs filed.


The solution is changing mpi-defaults to MPICH for 32-bit archs. MPICH
builds on all archs, but testing all dependencies of the change has not been
tested, and I don't know how you would do that - setting up eg ratt to
rebuild all on 32-bit archs (as everything on 64-bit will not have changed.)

Beside the easy part of chaning mpi-defaults, I count 30 something
packages that have explicit build dependencies on libopenmpi-dev. None
of those packages has bugs filed to change to mpich on 32 bit
architectures.

To be honest, I don't see these two changes (changing mpi-defaults to
mpich on 32 bit; breaking 32 bit build of openmpi) to be ready. It'd be
preferable to reinstate a 32-bit compatible pmix and fix openmpi on 32
bit until the time_t transition is done.

Cheers



I checked with "build-rdeps libopenmpi-dev"  and checked the packages. 
They are mostly false-alarms.


What is needed:

* mpich not to use libpmix for 32-bit archs. I've a patch i'm testing.

* armci-mpi builds on both mpich, openmpi. Needs work to only build on 
openmpi on 64-bit. #10683219


* code-saturne: Uses the default mpi version of hdf5. #1068318

* adios: fix just uploaded.

* vtk9: Depends on libhdf5-openmpi-dev instead of libhdf5-mpi-dev. #1068321

* trilinos: deps on openmpi , but only available on 64-bit systems. No 
change needed


* hdf5: Needs to depend on 64-bit archs for libopenmpi-dev. #1068320

* scalapack: Needs to dep on 64-bit archs only for libopenmpi-dev. #1068322


Regards

Alastair



--
Alastair McKinstry,
GPG: 82383CE9165B347C787081A2CBE6BB4E5D9AD3A5
ph: +353 87 6847928 e: alast...@mckinstry.ie, im: @alastair:mckinstry.ie



Bug#1064810: Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-02 Thread Sebastian Ramacher
On 2024-04-02 07:13:38 +0100, Alastair McKinstry wrote:
> 
> On 01/04/2024 23:25, Sebastian Ramacher wrote:
> > > There is a transition to openmpi-5 / mpi-defaults which is stalled by the
> > > t64 transition.
> > > 
> > > It drops 32-bit support from OpenMPI.
> > > 
> > > Because of this, I don't think the solution is to  port 32-bit atomics for
> > > armel/armhf, as it will be removed in a few weeks/months.
> > > 
> > > While we didn't want the transitions to be done simultaneously, it might 
> > > be
> > > the best answer.
> > > 
> > > 
> > > What does the release team think?
> > Adding another transition on top will just delay the time_t transition
> > even more. So if we can avoid that, I'd prefer to not do this transition
> > now. Unfortunately, uploads such as the one of pmix that no dropped
> > support for 32 bit architectures (#1068211) are not really helpful.
> > 
> > Also, #1064810 has no information on test builds with the new
> > mpi-defaults on a 32 bit architecture. So has this transition been
> > tested?
> > 
> > Cheers
> 
> OpenMPI 5 drops 32-bit support, but otherwise does not change the API/ABI.
> So it is technically not a transition, but breaks 32-bit builds.

Doesn't make it better. This is not the time to do that without tests
builds and bugs filed.

> The solution is changing mpi-defaults to MPICH for 32-bit archs. MPICH
> builds on all archs, but testing all dependencies of the change has not been
> tested, and I don't know how you would do that - setting up eg ratt to
> rebuild all on 32-bit archs (as everything on 64-bit will not have changed.)

Beside the easy part of chaning mpi-defaults, I count 30 something
packages that have explicit build dependencies on libopenmpi-dev. None
of those packages has bugs filed to change to mpich on 32 bit
architectures.

To be honest, I don't see these two changes (changing mpi-defaults to
mpich on 32 bit; breaking 32 bit build of openmpi) to be ready. It'd be
preferable to reinstate a 32-bit compatible pmix and fix openmpi on 32
bit until the time_t transition is done.

Cheers
-- 
Sebastian Ramacher



Bug#1064810: Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-02 Thread Alastair McKinstry


On 01/04/2024 23:25, Sebastian Ramacher wrote:

There is a transition to openmpi-5 / mpi-defaults which is stalled by the
t64 transition.

It drops 32-bit support from OpenMPI.

Because of this, I don't think the solution is to  port 32-bit atomics for
armel/armhf, as it will be removed in a few weeks/months.

While we didn't want the transitions to be done simultaneously, it might be
the best answer.


What does the release team think?

Adding another transition on top will just delay the time_t transition
even more. So if we can avoid that, I'd prefer to not do this transition
now. Unfortunately, uploads such as the one of pmix that no dropped
support for 32 bit architectures (#1068211) are not really helpful.

Also, #1064810 has no information on test builds with the new
mpi-defaults on a 32 bit architecture. So has this transition been
tested?

Cheers


OpenMPI 5 drops 32-bit support, but otherwise does not change the 
API/ABI. So it is technically not a transition, but breaks 32-bit builds.


The solution is changing mpi-defaults to MPICH for 32-bit archs. MPICH 
builds on all archs, but testing all dependencies of the change has not 
been tested, and I don't know how you would do that - setting up eg ratt 
to rebuild all on 32-bit archs (as everything on 64-bit will not have 
changed.)


I'm sorry I missed the dropped 32-bit support for pmix; I tested on 
64-bit platforms only.


Regards

Alastair


--
Alastair McKinstry,
GPG: 82383CE9165B347C787081A2CBE6BB4E5D9AD3A5
ph: +353 87 6847928 e:alast...@mckinstry.ie, im: @alastair:mckinstry.ie


Bug#1064810: Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-01 Thread Sebastian Ramacher
On 2024-04-01 12:05:30 +0100, Alastair McKinstry wrote:
> 
> On 23/03/2024 01:58, Thorsten Glaser wrote:
> > Andrey Rakhmatullin dixit:
> > 
> > > OPAL_THREAD_ADD_FETCH64 is defined under #if OPAL_HAVE_ATOMIC_MATH_64
> > > And I assume this arch doesn't have 64-bit atomics.
> > No native ones, yes.
> > 
> > I *think* either libatomic or libatomic_ops(?) make them
> > available, but very slowly, using a syscall to guarantee
> > atomicity (those systems are normally uniprocessor) on
> > m68k.
> > 
> > If possible, avoiding them would be preferrable. (For
> > example, in some cases, like reading a 64-bit timestamp,
> > if the writing direction is known and stable, reading
> > twice then comparing is a possible alternative at least
> > for some architectures (e.g. I know BSD code for sparc
> > does it that way).
> > 
> > I guess you’ll have to ask the porters of 32-bit arches
> > with no native 64-bit atomics for details.
> > 
> > Though I had thought GCC’s builtin atomics use the
> > aforementioned kernel-based workaround from that library
> > these days?
> 
> There is a transition to openmpi-5 / mpi-defaults which is stalled by the
> t64 transition.
> 
> It drops 32-bit support from OpenMPI.
> 
> Because of this, I don't think the solution is to  port 32-bit atomics for
> armel/armhf, as it will be removed in a few weeks/months.
> 
> While we didn't want the transitions to be done simultaneously, it might be
> the best answer.
> 
> 
> What does the release team think?

Adding another transition on top will just delay the time_t transition
even more. So if we can avoid that, I'd prefer to not do this transition
now. Unfortunately, uploads such as the one of pmix that no dropped
support for 32 bit architectures (#1068211) are not really helpful.

Also, #1064810 has no information on test builds with the new
mpi-defaults on a 32 bit architecture. So has this transition been
tested?

Cheers
-- 
Sebastian Ramacher



Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-01 Thread Andrey Rakhmatullin
On Mon, Apr 01, 2024 at 12:05:30PM +0100, Alastair McKinstry wrote:
> There is a transition to openmpi-5 / mpi-defaults which is stalled by the
> t64 transition.
> 
> It drops 32-bit support from OpenMPI.
> 
> Because of this, I don't think the solution is to  port 32-bit atomics for
> armel/armhf, as it will be removed in a few weeks/months.
> 
> While we didn't want the transitions to be done simultaneously, it might be
> the best answer.
It may have been somewhat easier for armel/armhf bootstrapping/rebuilding
if MPI stuff was dropped there early, but that's already finished
successfully so it doesn't matter.
Note that openmpi built successfuly on all release architectures so this
bug doesn't apply to them anyway.

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-04-01 Thread Alastair McKinstry


On 23/03/2024 01:58, Thorsten Glaser wrote:

Andrey Rakhmatullin dixit:


OPAL_THREAD_ADD_FETCH64 is defined under #if OPAL_HAVE_ATOMIC_MATH_64
And I assume this arch doesn't have 64-bit atomics.

No native ones, yes.

I *think* either libatomic or libatomic_ops(?) make them
available, but very slowly, using a syscall to guarantee
atomicity (those systems are normally uniprocessor) on
m68k.

If possible, avoiding them would be preferrable. (For
example, in some cases, like reading a 64-bit timestamp,
if the writing direction is known and stable, reading
twice then comparing is a possible alternative at least
for some architectures (e.g. I know BSD code for sparc
does it that way).

I guess you’ll have to ask the porters of 32-bit arches
with no native 64-bit atomics for details.

Though I had thought GCC’s builtin atomics use the
aforementioned kernel-based workaround from that library
these days?


There is a transition to openmpi-5 / mpi-defaults which is stalled by 
the t64 transition.


It drops 32-bit support from OpenMPI.

Because of this, I don't think the solution is to  port 32-bit atomics 
for armel/armhf, as it will be removed in a few weeks/months.


While we didn't want the transitions to be done simultaneously, it might 
be the best answer.



What does the release team think?



bye,
//mirabilos


--
Alastair McKinstry,
GPG: 82383CE9165B347C787081A2CBE6BB4E5D9AD3A5
ph: +353 87 6847928 e:alast...@mckinstry.ie, im: @alastair:mckinstry.ie


Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-03-22 Thread Thorsten Glaser
Andrey Rakhmatullin dixit:

>OPAL_THREAD_ADD_FETCH64 is defined under #if OPAL_HAVE_ATOMIC_MATH_64

Yes.

>, and I suspect not all of its uses also are.

That’s what I get from this, yes.

>And I assume this arch doesn't have 64-bit atomics.

No native ones, yes.

I *think* either libatomic or libatomic_ops(?) make them
available, but very slowly, using a syscall to guarantee
atomicity (those systems are normally uniprocessor) on
m68k.

If possible, avoiding them would be preferrable. (For
example, in some cases, like reading a 64-bit timestamp,
if the writing direction is known and stable, reading
twice then comparing is a possible alternative at least
for some architectures (e.g. I know BSD code for sparc
does it that way).

I guess you’ll have to ask the porters of 32-bit arches
with no native 64-bit atomics for details.

Though I had thought GCC’s builtin atomics use the
aforementioned kernel-based workaround from that library
these days?

bye,
//mirabilos
-- 
When he found out that the m68k port was in a pretty bad shape, he did
not, like many before him, shrug and move on; instead, he took it upon
himself to start compiling things, just so he could compile his shell.
How's that for dedication. -- Wouter, about my Debian/m68k revival



Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-03-22 Thread Andrey Rakhmatullin
On Sun, Mar 17, 2024 at 06:30:27PM +, Thorsten Glaser wrote:
> In file included from ../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c:14:
> ../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c: In function 
> 'mca_btl_ofi_get':
> ../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.h:33:13: error: implicit 
> declaration of function 'OPAL_THREAD_ADD_FETCH64'; did you mean 
> 'OPAL_THREAD_ADD_FETCH32'? [-Werror=implicit-function-declaration]
>33 | OPAL_THREAD_ADD_FETCH64(&(module)->outstanding_rdma, 1);  
>   \
>   | ^~~
> ../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c:70:5: note: in expansion of 
> macro 'MCA_BTL_OFI_NUM_RDMA_INC'
>70 | MCA_BTL_OFI_NUM_RDMA_INC(ofi_btl);
OPAL_THREAD_ADD_FETCH64 is defined under #if OPAL_HAVE_ATOMIC_MATH_64, and
I suspect not all of its uses also are. And I assume this arch doesn't
have 64-bit atomics.

-- 
WBR, wRAR


signature.asc
Description: PGP signature


Bug#1067055: openmpi: error: implicit declaration of function 'OPAL_THREAD_ADD_FETCH64'

2024-03-17 Thread Thorsten Glaser
Source: openmpi
Version: 4.1.6-7
Severity: serious
Justification: ftbfs
Tags: ftbfs
Tag: ftbfs
X-Debbugs-Cc: t...@mirbsd.de

[…]
libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../../../opal/mca/btl/ofi 
-I../../../../opal/include -I../../../../ompi/include 
-I../../../../oshmem/include 
-I../../../../opal/mca/hwloc/hwloc201/hwloc/include/private/autogen 
-I../../../../opal/mca/hwloc/hwloc201/hwloc/include/hwloc/autogen 
-I../../../../ompi/mpiext/cuda/c -I../../../../../.. -I../../../.. 
-I../../../../../../opal/include -I../../../../../../orte/include 
-I../../../../orte/include -I../../../../../../ompi/include 
-I../../../../../../oshmem/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 
-D_TIME_BITS=64 -Wdate-time -D_FORTIFY_SOURCE=2 -I/usr/local/include 
-I/usr/local/include -DNDEBUG -g -O2 -Werror=implicit-function-declaration 
-ffile-prefix-map=/tmp/buildd/openmpi-4.1.6=. -fstack-protector-strong -Wformat 
-Werror=format-security -O3 -finline-functions -fno-strict-aliasing -c 
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c  -fPIC -DPIC -o 
.libs/btl_ofi_rdma.o
In file included from ../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c:14:
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c: In function 
'mca_btl_ofi_get':
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.h:33:13: error: implicit 
declaration of function 'OPAL_THREAD_ADD_FETCH64'; did you mean 
'OPAL_THREAD_ADD_FETCH32'? [-Werror=implicit-function-declaration]
   33 | OPAL_THREAD_ADD_FETCH64(&(module)->outstanding_rdma, 1);
\
  | ^~~
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c:70:5: note: in expansion of 
macro 'MCA_BTL_OFI_NUM_RDMA_INC'
   70 | MCA_BTL_OFI_NUM_RDMA_INC(ofi_btl);
  | ^~~~
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c:82:40: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
   82 | remote_address = (remote_address - (uint64_t) 
remote_handle->base_addr);
  |^
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c: In function 
'mca_btl_ofi_put':
../../../../../../opal/mca/btl/ofi/btl_ofi_rdma.c:132:40: warning: cast from 
pointer to integer of different size [-Wpointer-to-int-cast]
  132 | remote_address = (remote_address - (uint64_t) 
remote_handle->base_addr);
  |^
cc1: some warnings being treated as errors
make[4]: *** [Makefile:1946: btl_ofi_rdma.lo] Error 1
make[4]: Leaving directory 
'/tmp/buildd/openmpi-4.1.6/debian/build-gfortran/opal/mca/btl/ofi'