Thanks for all your suggestions.
I just generated a pull request
https://github.com/scheme-requests-for-implementation/srfi-231/pull/43
to add an implementation of f16-storage-class, plus tests, to the sample
implementation.
If comments or further suggestions would be appreciated.
The implementation assumes at least 32-bit fixnums and 64-bit IEEE
double flonums, and uses flilogb, flscalbn, and an internal Gambit
procedure ##flcopysign, which do the same things as their C
counterparts. Otherwise, it's fairly straightforward.
I'm still thinking about the f8-storage-class issue.
Brad
On 3/14/23 6:39 PM, Shiro Kawai wrote:
Gauche also supports f16 numeric vector.
The conversion routines is in C. Writing it in portable Scheme may be a
bit of challenge (you'll probably need to assume IEEE double
representation for flonums, at least)
https://github.com/shirok/Gauche/blob/master/src/number.c#L492
<https://github.com/shirok/Gauche/blob/master/src/number.c#L492>
https://github.com/shirok/Gauche/blob/master/src/number.c#L469
<https://github.com/shirok/Gauche/blob/master/src/number.c#L469>
On Tue, Mar 14, 2023 at 6:24 AM John Cowan <[email protected]
<mailto:[email protected]>> wrote:
There's also a C version at
<https://www.mathworks.com/matlabcentral/fileexchange/23173-ieee-754r-half-precision-floating-point-converter
<https://www.mathworks.com/matlabcentral/fileexchange/23173-ieee-754r-half-precision-floating-point-converter>>.
On Tue, Mar 14, 2023 at 12:22 PM John Cowan <[email protected]
<mailto:[email protected]>> wrote:
Unlike f8, f16 does have an IEEE standard, so it would be
possible to use the same general strategy I proposed for f8 to
provide a single f16 representation without any need for
f16vectors or f16 hardware support. There is a Java version of
the necessary converters at
<https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687
<https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687>>;
it doesn't require anything Java-specific but is pure bit-diddling code. The opposite
side is float32 rather than float64, but that is easily changed. As the article points
out, a lookup table of 64K floats is also a plausible implementation. Doing one or both of
these things pushes the implementation closer to the spec.
No changes to the SRFI would be required, since the lack of
support for f16 is not documented in the Implementation section.