https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90329

Thomas Koenig <tkoenig at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |toon at moene dot org

--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(A reply to https://gcc.gnu.org/ml/fortran/2019-05/msg00021.html ;
I want to have this in the PR's audit trail).

Toon, I have added you to this because of your experience going back
to g77, and also because you're a member of the steering committee.

> I've debugged one of the packages and I 
> confirm that the breakage is related to passing of strings from C to 
> Fortran. Indeed, BLAS and LAPACK define a large number of subroutines 
> that take one or more explicit single-character strings as arguments. 
> Other than that, BLAS has only one function (xerbla), which takes a 
> string of unspecified length, LAPACK only has four (ilaenv, 
> ilaenv2stage, lsamen, xerbla). The C interfaces to BLAS/LAPACK from 
> Netlib depend on the historic behavior that explicit single-character 
> strings are interoperable, concretely CBLAS and LAPACKE provide C 
> interfaces/code that calls into Fortran BLAS/LAPACK without passing the 
> 1s as lengths of the character strings (functions taking a string of 
> unspecified length are trivial and re-implemented in C).

OUCH. So, basically, people have been depending on C undefined
behavior for ages, and this includes recent developments like
LAPACKE.  Only an accident of calling conventions has kept
this "working".  Oh my...

If we are looking at the System V Application Binary Interface
AMD64 Architecture Processor Supplement, I can understand how
this worked.

I suppose this never worked on Windows stdcall, where the
callee cleans up the stack, so it must know the exact number
of bytes, and the functions are therefore decorated with the
number of bytes in the argument list.

(By the way, interoperability has a special meaning in Fortran,
where it is reserved for BIND(C) procedures and variables).

> This has been 
> working fine for very many years as the Fortran code never needed to 
> access the length it knew was 1. R has been using the same practice, 
> which long predates ISO_C_BINDING/BIND(C), and I've seen online 
> discussions where people assumed interoperability of length 1 strings, 
> once mentioning also a citation from Fortran 2003 Handbook that says "A 
> Fortran character string with a length other than 1 is not 
> interoperable" (which invites interpretation that length 1 strings were 
> ).

They are interoperable in the sense that they can be an argument
to a BIND(C) procedure. This is the terminology used by the
Fortran standard.

> I am not an expert to say whether the current Fortran standard 
> requires that interoperability and I assume that it does not given this 
> gfortran change.

What we are discussing here is outside the Fortran standard's scope.

> This gfortran change breaks this interoperability: if a C function calls 
> a Fortran function, passing it a single-character string for a parameter 
> taking explicit single-character Fortran string, it may crash. I've 
> debugged one case with R package BDgraph, this example 
> "library(BDgraph); data.sim <- bdgraph.sim( n = 70, p = 5, size = 7, vis 
> = TRUE )" crashes due to corruption of C stack by Fortran function 
> DPOSV, when compiled with the new gfortran and with -O2. To see the 
> problem, one can just look at the disassembly of DPOSV (LAPACK), neither 
> the package nor R is not necessary:
> 
> SUBROUTINE DPOSV( UPLO, N, NRHS, A, LDA, B, LDB, INFO )
> CHARACTER          UPLO
> 
> In one case, DPOSV calls DPOTRS before returning. The new gfortran with 
> -O2 performs tail-call optimization, jumping to DPOTRS. In the annotated 
> disassembly snippet, at 11747f1, DPOSV tries to ensure that there is 
> constant 1 as string length of UPLO when tail-calling into DPOTRS, so it 
> writes it to stack where there already should have been 1 as length of 
> UPLO passed to DPOSV. But this argument has not been passed to DPOSV, so 
> this causes stack corruption.

[assembly elided]

> Note that DPOSV never uses the length of the string (UPLO) from the 
> hidden argument, the compiler clearly knows that its length is 1. In 
> calls where the length is passed in registers, this does not cause 
> trouble (like LSAME) and indeed is needed as the registers have 
> different values
> 
>        IF( .NOT.LSAME( UPLO, 'U' ) .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
> 
>    117448:       b9 01 00 00 00          mov    $0x1,%ecx
> 
>    11744d:       ba 01 00 00 00          mov    $0x1,%edx
> 
>    117452:       48 8d 35 bb 12 09 00    lea    
> 0x912bb(%rip),%rsi        # 1a8714 <ipivot.4261+0xd14>
> 
>    117459:       4c 89 f7                mov    %r14,%rdi
> 
>    11745c:       e8 1f 3d ef ff          callq  b180 <lsame_@plt>
> 
> but it seems to me that the compiler could just refrain from setting the 
> length to be 1 on the stack at 1174f1, since it knows it should have 
> already been there.

That would be a nice feature of tail-call optimiation, which would also
"fix" this bug at the same time.

> It would be a nice property if Fortran code that 
> never accesses the hidden arguments with the lengths of the strings, 
> because it knows what those lengths are, would also never write to those 
> hidden arguments on the stack when it knows what they are (should be).

But it would not be clear if the callee also does not access the
memory.  Hm... I think it might make sense to make a separate
optimzation PR out of this.

> Before the gfortran change, DPOSV would call to DPOTRS normally (no 
> tail-call optimization), so this problem would not occur (I tested with 
> 268974). By disabling tail call optimization via 
> -fno-optimize-sibling-calls, the problem goes away also for other 
> packages my colleagues have identified as crashing with the new 
> gfortran.

OK, I can see that.

> Did you know of any other optimization that could break this 
> interoperability of 1-length strings? It would be really nice to users 
> if this interoperability could be preserved, and if not by default than 
> at least with some option.

LTO is likely to have issues, both positive and negative ones. On the
one hand, you will get warnings about LTO mismatches. On the other
hand, LTO has more information, so it will probably silently correct
the mismatch.

It is certainly worth a _LOT_ to be LTO-clean, especially in a project
like R where people write a lot of add-on code.

(At the moment, you put your version of LAPACK into a shared library,
so LTO cannot be used for this.).

The tail-call issue is a good catch.  I don't know of any other option
which might be affected, but I am also not an expert here. Especially,
gcc and gfortran support very many platforms with (I presume) many
calling conventions, so I am not sure that this will solve all issues.

> Traditionally, BLAS/LAPACK implementations are interchangeable at 
> dynamic linking time, using the Fortran interface that is however also 
> used from C, without passing lengths for fixed 1-character strings. R 
> supports this too, at least on some Linux distributions including 
> Debian/Ubuntu it is packaged so that it runs with the BLAS/LAPACK 
> implementation installed on the system. Even though this is probably not 
> correct wrt to the todays Fortran standard (I don't know for sure), this 
> is the common practice, and fixing this would not be easy - one would 
> have to create a new interface to be used from C, separate from the 
> Fortran one, and all software would have to start using that interface 
> from C. In the current situation when the Fortran interface is used, 
> confusion will arise with this gfortran change as different BLAS/LAPACK 
> implementations are built by different Fortran compilers and use a 
> different mix of Fortran/C for different computational subroutines. Note 
> CBLAS could not be readily used as it itself breaks with the current 
> gfortran change as well.

OK, I see this could be a major problem.

One of the issues is that the old way, calling a non-variadic
function in a variadic way, is also not guaranteed to be free
of trouble.

So, what to do?

It might be a possibility to invent some new flag, let's call it
-fomit-string-length, which simply does not pass the string
lengths. This would be binary compatible with existing calls,
but a major ABI change, and would also probably break libgfortran.

Another possibility would be to add an option like -fexternal-varargs,
which would call procedures without explicit interface with a variadic call.
Not ideal (see above), but it would restore the old behavior.

Somewhat orthogonal to this, it would also be possible to emit
C prototypes for all external subroutines.  I wrote this kind of
thing for BIND(C) procedures and variables, and it is eminently
doable. So any project could just generate the correct prototypes
during compilation.

Hm...

Reply via email to