Hi,
I probably jumped in too far, too fast, without a life jacket.
Consider the program below which defines some paramaters of the IEEE 754
binary32 and binary64 floating point numbers which correspond to real(32)
and real(64) numbers.
There are defined 2 groups of functions, one for real(32) and a hopefully
orthogonal set for real(64). There is a third group which works for either
size real(?) number.
I have defined the first two groups as 'param' because I know the values.
Sorry, thought I knew the values.
The trivial program runs fine with the type R as real(64) on line 83.
Changing R to be real(32) sends it seriously off the deep end. So I
stripped param from the first batch and it works.
I notice that
a) Changing the type definition R to be real(32) appears to imply
that X:real(32) is run-time conversion. And ditto for x:int(32).
How do I define a 32 bit which is a compile time constant, i.e.
equivalent of C's
((float) 1.2345)
which is a run-time constant would be nice. If not, Chapel needs
param t = 1.2345f;
whereby one can assert that
t.type == real(32)
This will mean that the real(32) proc's can be then have an
identical definition to the real(64) proc's.
That also applies to binary floating point constants.
Actually my preference is that type version of a compile-time
constant should result in another compile-time constant.
Once this issue is resolved, every proc in the first group
should then be able to be defined as a param. Am I correct?
b) When do real numbers become citizens of full standing? While
param _1p52 = 1 << 52;
is a compile time expression, trying to do
param bad = 1.0 / 0x1.0p52;
fails. My reading of 8.4.1 of the specification, parameters
expressions can be applications of the binary operators +, -, *,
/, **, ==, !=, <=, >=, <, and > on operands that are real,
imaginary or complex parameter expressions.
I think my use complies but obviously not. What have I done wrong
above or have I just misread things? Do I need some option to
enable this feature?
Once this issue is resolved, every proc in the third group
should then be able to be defined as a param. Am I correct?
For now, I have some routines in C that I would like to use in place of
fpReal??ToRaw and fpRawToReal?? in my code. They do type punning which is
not possible in Chapel itself. For now, I I can sympathize with that if
onle because I cannot think of an elegant, generic, way to do it for the
moment.
If I compile them into a C '.o' file, how do I link them? I do not want to
use LLVM for the moment if possible as none of the machines to which I
have easy access have a sufficiently high revisions of Cmake installed.
// This program is broken
//
// Note : VDS = Visible Digits in the Significand
// IEEE754 binary32 Model
proc fpOneBit(type T) param where T == real(32) return 1:uint(32);
proc fpBias(type T) param where T == real(32) return 0x7f:int(32);
proc fpEinfB(type T) param where T == real(32) return 0xff:uint(32);
proc fpEmax(type T) param where T == real(32) return +127:int(32);
proc fpEmin(type T) param where T == real(32) return -126:int(32);
proc fpVDS(type T) param where T == real(32) return 23:uint(32);
proc fpEpsISO(type T) param where T == real(32) return 0x1.0p-23:real(32);
proc fpVmax(type T) param where T == real(32) return 0x1.0p+127:real(32);
proc fpVmin(type T) param where T == real(32) return 0x1.0p-126:real(32);
proc fpDekkerSplit(type T) param where T == real(32) return 0x1.0p12:real(32);
// IEEE754 binary64 Model
proc fpOneBit(type T) param where T == real(64) return 1:uint(64);
proc fpBias(type T) param where T == real(64) return 0x3ff:int(64);
proc fpEinfB(type T) param where T == real(64) return 0x7ff:uint(64);
proc fpEmax(type T) param where T == real(64) return +1023:int(64);
proc fpEmin(type T) param where T == real(64) return -1022:int(64);
proc fpVDS(type T) param where T == real(64) return 52:uint(64);
proc fpEpsISO(type T) param where T == real(64) return 0x1.0p-52:real(64);
proc fpVmax(type T) param where T == real(64) return 0x1.0p+1023:real(64);
proc fpVmin(type T) param where T == real(64) return 0x1.0p-1022:real(64);
proc fpDekkerSplit(type T) param where T == real(64) return 0x1.0p27:real(64);
// IEEE754 binary? Model
proc fpNegMsk(type T) return fpOneBit(T) << (numBits(T) - 1);
proc fpInfRaw(type T) return fpEinfB(T) << fpVDS(T);
proc fpHuge(type T) return fpVmax(T) * (2.0:T - fpEpsISO(T));
proc fpTiny(type T) return fpVmin(T) * fpEpsISO(T);
proc fpDekker(type T) return fpDekker(T) + 1.0:T;
// this next 4 Routines need to be flicked and replaced with
//
// extern proc fpReal64ToRaw(x : real(64)) : uint(64);
// extern proc fpReal32ToRaw(x : real(32)) : uint(32);
// extern proc fpRawToReal64(x : uint(64)) : real(64);
// extern proc fpRawToReal32(x : uint(32)) : real(32);
//
// These routines will be written in C - how do I link them??
proc fpReal64ToRaw(x : real(64)) : uint(64)
{
return 1234:uint(64);
}
proc fpReal32ToRaw(x : real(32)) : uint(32)
{
return 1234:uint(32);
}
proc fpRawToReal64(x : uint(64)) : real(64)
{
return 789.0:real(64);
}
proc fpRawToReal32(x : uint(32)) : real(32)
{
return 789.0:real(32);
}
proc fpRawToReal(x : uint(32)) : real(32) return fpRawToReal32(x);
proc fpRawToReal(x : uint(64)) : real(64) return fpRawToReal64(x);
proc fpRealToRaw(x : real(32)) : uint(32) return fpReal32ToRaw(x);
proc fpRealToRaw(x : real(64)) : uint(64) return fpReal64ToRaw(x);
module T
{
proc main()
{
param _1p52 = 1 << 52;
var bad = 1.0 / 0x1.0p52;
// param bad = 1.0 / _1p52;
type R = real(64);
type U = uint(numBits(R));
var x : R = fpRawToReal(10:U);
var t = 2.0:R;
t -= fpEpsISO(t.type);
t -= 2.0:R;
writeln("Epsilon(64) ", fpEpsISO(real(64)), " matches ", bad);
writeln("p = ", fpVDS(x.type), " for real(?) where ? = ",
numBits(R));
writeln("Largest! Normal Float ", fpHuge(x.type));
writeln("Smallest Normal Float ", fpVmin(x.type));
writeln("Smallest Actual Float ", fpTiny(x.type));
writeln("Sign Bit Mask is ", fpNegMsk(x.type));
writeln("Inf. Raw Bits is ", fpInfRaw(x.type));
writeln("Dummy call to check : ", fpRealToRaw(t));
writeln("Dummy call to check : ", fpRawToReal(_1p52));
}
}
Regards - Damian
Pacific Engineering Systems International, 277-279 Broadway, Glebe NSW 2037
Ph:+61-2-8571-0847 .. Fx:+61-2-9692-9623 | unsolicited email not wanted here
Views & opinions here are mine and not those of any past or present employer
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-developers