Thanks for all your suggestions.

I just generated a pull request

https://github.com/scheme-requests-for-implementation/srfi-231/pull/43

to add an implementation of f16-storage-class, plus tests, to the sample implementation.

If comments or further suggestions would be appreciated.

The implementation assumes at least 32-bit fixnums and 64-bit IEEE double flonums, and uses flilogb, flscalbn, and an internal Gambit procedure ##flcopysign, which do the same things as their C counterparts. Otherwise, it's fairly straightforward.

I'm still thinking about the f8-storage-class issue.

Brad

On 3/14/23 6:39 PM, Shiro Kawai wrote:

Gauche also supports f16 numeric vector.
The conversion routines is in C.  Writing it in portable Scheme may be a bit of challenge (you'll probably need to assume IEEE double representation for flonums, at least)

https://github.com/shirok/Gauche/blob/master/src/number.c#L492 <https://github.com/shirok/Gauche/blob/master/src/number.c#L492> https://github.com/shirok/Gauche/blob/master/src/number.c#L469 <https://github.com/shirok/Gauche/blob/master/src/number.c#L469>


On Tue, Mar 14, 2023 at 6:24 AM John Cowan <[email protected] <mailto:[email protected]>> wrote:

    There's also a C version at
    
<https://www.mathworks.com/matlabcentral/fileexchange/23173-ieee-754r-half-precision-floating-point-converter
 
<https://www.mathworks.com/matlabcentral/fileexchange/23173-ieee-754r-half-precision-floating-point-converter>>.

    On Tue, Mar 14, 2023 at 12:22 PM John Cowan <[email protected]
    <mailto:[email protected]>> wrote:

        Unlike f8, f16 does have an IEEE standard, so it would be
        possible to use the same general strategy I proposed for f8 to
        provide a single f16 representation without any need for
        f16vectors or f16 hardware support.  There is a Java version of
        the necessary converters at
        
<https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687
 
<https://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687>>;
 it doesn't require anything Java-specific but is pure bit-diddling code.  The opposite 
side is float32 rather than float64, but that is easily changed.  As the article points 
out, a lookup table of 64K floats is also a plausible implementation.  Doing one or both of 
these things pushes the implementation closer to the spec.

        No changes to the SRFI would be required, since the lack of
        support for f16 is not documented in the Implementation section.


Reply via email to