https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71064

--- Comment #1 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> (It's generally tuned for speed instead of precision, and does not strive for 
> full IEEE-754 conformance.)

(PTX is an abstract ISA, if it's tuned for anything it's the simplicity of
abstraction and matching for the underlying GPU ISA well enough; why do you
claim it doesn't strive for IEEE-754 conformance? if you look at the PTX ISA
spec, you'll see remarks on compliance, such as full compliance for +-*/ sqrt
fma on recent hardware implementations; supporting extended fp types is
optional).

Compilation can succeed if 'double' and 'long double' happen to be binary
compatible, like what offloading from ARM would have.  Otherwise, diagnosing
that as a target compiler error is the right thing to do.  Falling back is not
practical, because in the general case the boundary the double<->long double
translation would need to happen is: individual memory accesses in offloaded
code.

An exception are situations where data in 'long double' type is never exchanged
across the host-device boundary: all long double variables referenced in the
target region are private to it.  You still need to make sure that sizeof(long
double) matches, but then on memory accesses you can load/store leftmost 8
bytes (of 16) into a PTX register and operate on it.

Reply via email to