Re: [sc-dev] development status of BETADIST

Eike Rathke Wed, 06 Aug 2008 16:09:45 -0700

Hi Regina,

On Tuesday, 2008-08-05 22:17:56 +0200, Regina Henschel wrote:


> It seems to be one of the most difficult functions. To get an idea of  
> that function, have a look at the file toms708.c of the R-project.

Wow, impressive. The original coder(s) must have spent a lot of time for
numeric error analysis on that.

>>> (2)
>>> Nearly all terms inside have parts like (1-x)^r. When the x argument 
>>> is  close to 1, approximately x > 0.9999, then the term (1-x)^r has 
>>> large  cancellation errors. I know no way to avoid it. Switching to 
>>> power  series 1+x^2+x^3+... is no solution, because it nearly do not 
>>> converge  for x near 1.
>>>
>>> That leads to the question: Which accuracy should the function have 
>>> in  which part of the domain? My suggestion would be, in case x > 
>>> 0.9999 not  to try to get more accuracy but document the loss of 
>>> accuracy in the  application help.
>>
>> Do you think this is a general problem for this particular function, or
>> is it related to the algorithm used? Would there be not overly
>> sophisticated algorithms that eliminated the accuracy loss?
>
> Please take a spreadsheet and enter x1=0.999999+6.1E-016 and  
> x2=0.999999-4.9E-016 and then calculate 1-x1 an 1-x2. That is the range,  
> which is possible for an input, that gives 0.999999 in the 15-digit  
> format, Calc uses.

Understood.

> So I think in the meantime, that it cannot be solved  
> with a program that is bound to double.

Note that the trick Leonard mentioned in his reply indeed gives
identical results for 0.5-x1+0.5 respectively 0.5-x2+0.5 when used on
raw double values in C/C++ assigned to a double and displayed using
printf, not in a result of a spreadsheet formula expression though; same
in Calc and Gnumeric, so not due to some Calc rounding madness.

> The beta distribution is  
> affected by cancellation errors more than other functions, because in  
> its definition you have "x" as well as "1-x".
>
> If I understand the Gnumeric code correct, they use long double values  
> where possible. But MSVC Express doesn't have long double, but maps it  
> to double.

Sigh.. not only MSVC Express, also the .NET compilers. They gave up on
it after 16-bit times, claiming it was for compatibility with non-x86
machines.

> I would like to compare Calc with a Gnumeric, which is  
> compiled without support of long double.

I'm not sure if any gcc did have a switch for mapping long double to
double, there was a -Wno-long-double but AFAIK that gave only a warning.
Recent versions don't have that anymore.

Gnumeric for Windows does exist, but I doubt it would be compilable
using MSVC.

Last resort would be to replace long double with double in the functions
of interest and recompile ...


>>> If someone knows a solution that gives more accuracy with less   
>>> iterations, please let me know. I failed with the algorithm BASYM 
>>> from  Didonato likely because of the needed erfc function.
>
> It seems, that I can solve the erfc problem :) I need to learn about  
> "static methods" (see other thread).
>

Heh, that's an easy one :-)

>> So again MSVC seems to be lacking a C99 mathematical function. Given an
>> erfc() function the BASYM algorithm would do fine?
>
> I have implemented BASYM directly into ScTTT now. The accuracy is over  
> 10 digits, sometimes up to 15 digits, same as the continued fraction. It  
> calculates with less then 1000 iteration. So the accuracy is about the  
> same as continued fraction, but with less iterations.

At least 10 digits accuracy sounds pretty good to me.


>>> [... iterations ...]
>>> What to do? Setting a lower limit?
>>
>> We could do with a lower limit for OOo3.1 if indeed the 50000 iterations
>> are hit too many times and turn out to be a bottleneck, and once you
>> read the book ;) improve on the algorithm for OOo3.2
>
> Using the function in a spreadsheet, I notice no performance problems  
> with that limit of iterations. But a real performance test would be  
> necessary to find an acceptable limit.

For a first attempt try with thousand and several thousand formula cells
depending on one value cell, place a SUM() function beneath that to
force calculation of all formula cells and change the value in the value
cell.

> I'll try to finish it in a way, that BASYM is used where needed.  
> Unfortunately summer holidays are over and my vocation ends.

Too bad..


>>> (4)
>>> Which values of alpha and beta should be supported? The larger they 
>>> are  the smaller is the range in which the result goes from near 0 to 
>>> near 1.  So one machine number for x would cover a large range of 
>>> "correct"  results. I have no experience in using BETADIST in real 
>>> life, but I  doubt that something like alpha=20000 is really needed.
>>
>> I actually have no idea :-(  Maybe someone on the list could help out
>> with some expertise?
>
> The spec should give a recommendation.

Maybe someone in the formula committee could give some.

> MuPad, the CAS I have got at  
> home, overflows for large parameters. Of cause high end programs like  
> Mathematica have no problems there (You can use it online for single  
> function calls).

Call me blind, I didn't find that. Do you have an URL?


>>> (6)
>>> Adapt the algorithms to the solution concerning expm1 and log1p.
>>
>> Added a dependency to i91602.
>
> Without them it is impossible to get the desired accuracy.

Hopefully the results of the replacement function and different
compiler's built-in functions are nearly identical..

> I know it is a very long mail and you need not to answer.

I felt I needed to ...

> I'll attach  the version with BASYM to the issue, but it will take
> some time until it  is ready. I'll ask if I stuck.

I want to give you every support I can, you're doing such a great work
on our functions' accuracy.

Thanks
  Eike

-- 
 PGP/OpenPGP/GnuPG encrypted mail preferred in all private communication.
 Key ID: 0x293C05FD - 997A 4C60 CE41 0149 0DB3  9E96 2F1A D073 293C 05FD

pgpQxzGpbQuhl.pgp
Description: PGP signature

Re: [sc-dev] development status of BETADIST

Reply via email to