Some of us (game developers especially) greatly prefer a minor inaccuracy to a potentially major slowdown; I would personally opposed this change, as you're noticeably increasing the cost of something that's used heavily in tightly looped code. Perhaps an appropriately named #ifdef switch would be a way to please everyone here?
Regards, Riot On 7 September 2016 at 16:21, lhmouse <lh_mo...@126.com> wrote: > (I don't write AT&T assembly so I am unable to make a patch. > Nevertheless I hope someone who writes AT&T assembly could fix it.) > > The x87 `sinl` instruction has been suffering from an accuracy problem > since decades ago, which is described in this article: > https://software.intel.com/blogs/2014/10/09/fsin- > documentation-improvements-in-the-intel-64-and-ia-32- > architectures-software > > Long story short: Before we can calculate `sin(x)`, we have to reduce `x` > such that it falls in (-π/2,π/2]. This can be easily done via dividing `x` > with π and getting the remainder. The problem is that, instead of > a reasonably accurate value of π, `FSIN` uses a 66-bit approximate value > as the divisor, which makes the result very inaccurate if `x` is proximate > to > some multiple of π, because the remainder would end up with most of > its upper bits being zero and very few significant bits left. > > To compromise with Intel people, as the article suggests, it is essential > to reduce `x` before executing the `fsin` instruction. This is done as > follows: > > 1. Use `FLDPI` instruction to get an accurate value of π. > 2. Run `FPREM1` instruction repeatly until the _C2_ bit in FPU status word > is cleared. The result remainder will be in (-π/2,π/2], and > the _C0_,_C3_,_C1_ bits are the least three significant bits of > the quotient, from left to right. > 3. Calculate the sine value using `FSIN` instruction. This never fails. > 4. Acknowledging that `sin(x) = -sin(x+kπ)` when `k` is odd and > `sin(x) = sin(x+kπ)` when `k` is even, because the parity bit of the > quotient > is the _C1_ bit in the FPU status word, if it is set, negate the > result with > `FCHS` instruction. We get the sine value now. > > The above process is the same for cosine. > In the case of tangent, step 4 should be removed. > > > The following code fragment compares `sinl` from mingw-w64 and > my own implementation: > > volatile auto one = 1.0l; > auto theta = atanl(one) * 4; // This function is from mingw-w64. > std::printf("sinl (theta) = %.16Le\n", sinl (theta)); > std::printf("my_sinl(theta) = %.16Le\n", my_sinl(theta)); > > It produces the following result: > > sinl (theta) = 1.6263032587282567e-019 > my_sinl(theta) = -0.0000000000000000e+000 > > My implementation could be found here: > https://github.com/lhmouse/MCF/blob/master/MCFCRT/src/stdc/math/sin.c#L12 > > static inline long double fpu_sin(long double x){ > unsigned fsw; > const long double reduced = __MCFCRT_fremainder(&fsw, x, > __MCFCRT_fldpi()); > long double ret = __MCFCRT_fsin_unsafe(reduced); > if(fsw & 0x0200){ > ret = __MCFCRT_fneg(ret); > } > return ret; > } > > -------------- > Best regards, > lh_mouse > 2016-09-07 > > > ------------------------------------------------------------ > ------------------ > _______________________________________________ > Mingw-w64-public mailing list > Mingw-w64-public@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/mingw-w64-public > ------------------------------------------------------------------------------ _______________________________________________ Mingw-w64-public mailing list Mingw-w64-public@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/mingw-w64-public