https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #11 from Alexander Monakov ---
Yes, that.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #10 from Tom de Vries ---
(In reply to Alexander Monakov from comment #8)
> No, -msoft-stack-reserve-local is really meant to be in bytes: it may not
> exceed the amount of .local memory reserved by CUDA driver (which is just
> 1-2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #9 from Tom de Vries ---
(In reply to Tom de Vries from comment #2)
> Minimal version (without inlining sinf code from newlib):
> ...
> /* { dg-additional-options "-lm -foffload=-lm" } */
>
> #define N 1
>
> int
> main (void) {
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #8 from Alexander Monakov ---
No, -msoft-stack-reserve-local is really meant to be in bytes: it may not
exceed the amount of .local memory reserved by CUDA driver (which is just 1-2
KB, unless overridden via cuCtxSetLimit, which
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #7 from Tom de Vries ---
(In reply to Alexander Monakov from comment #6)
> (In reply to Tom de Vries from comment #4)
> > So, I think calling functions from simd code is atm not supported for nvptx.
> >
> > Stack variables in simd
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #5 from Tom de Vries ---
FWIW, another aspect here is convergence (as usual).
Looking at the SASS code for main$_omp_fn$0$impl, I don't find evidence for the
usual divergence/convergence ops (SSY/SYNC), which might mean that the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #4 from Tom de Vries ---
So, I think calling functions from simd code is atm not supported for nvptx.
Stack variables in simd code are mapped on a per-thread stack rather than on
the
usual per-warp stack.
The functions are compiled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #3 from Tom de Vries ---
[ Note, this is with GOMP_NVPTX_JIT=-O0. ]
In sinf, we have:
...
45:return -__kernel_cosf(y[0],y[1]);
...
which translates to:
...
.loc 1 45 12
ld.f32 %r67,[%frame+4];
ld.f32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #2 from Tom de Vries ---
Minimal version (without inlining sinf code from newlib):
...
/* { dg-additional-options "-lm -foffload=-lm" } */
#define N 1
int
main (void) {
float k[N];
float res;
for (int i = 0; i < N; i++)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203
--- Comment #1 from Tobias Burnus ---
Besides PR95654, see PR81778 and PR80053.
11 matches
Mail list logo