Re: [sc-dev] development status of BETADIST

Regina Henschel Tue, 05 Aug 2008 13:18:00 -0700

Eike Rathke schrieb:

Hi Regina,


I postponed this reply far too many times.. sorry.
Too many questions at once ;-)

It seems to be one of the most difficult functions. To get an idea ofthat function, have a look at the file toms708.c of the R-project.


On Saturday, 2008-07-19 00:35:29 +0200, Regina Henschel wrote:

(2)
Nearly all terms inside have parts like (1-x)^r. When the x argument isclose to 1, approximately x > 0.9999, then the term (1-x)^r has largecancellation errors. I know no way to avoid it. Switching to powerseries 1+x^2+x^3+... is no solution, because it nearly do not convergefor x near 1.
That leads to the question: Which accuracy should the function have inwhich part of the domain? My suggestion would be, in case x > 0.9999 notto try to get more accuracy but document the loss of accuracy in theapplication help.
Do you think this is a general problem for this particular function, or
is it related to the algorithm used? Would there be not overly
sophisticated algorithms that eliminated the accuracy loss?

Please take a spreadsheet and enter x1=0.999999+6.1E-016 andx2=0.999999-4.9E-016 and then calculate 1-x1 an 1-x2. That is the range,which is possible for an input, that gives 0.999999 in the 15-digitformat, Calc uses. So I think in the meantime, that it cannot be solvedwith a program that is bound to double. The beta distribution isaffected by cancellation errors more than other functions, because inits definition you have "x" as well as "1-x".

If I understand the Gnumeric code correct, they use long double valueswhere possible. But MSVC Express doesn't have long double, but maps itto double. I would like to compare Calc with a Gnumeric, which iscompiled without support of long double.


 If not, and

it is a general problem, we could state that in the ODFF spec as well.
Stating it in the application help of course would be good anyway.


I think it is a general problem, but I'm no expert in number formats.

(3)
For x near alpha/(alpha+beta), which is mean p, the loops need hugeamount of iterations. I cound more than 50000 in some cases. Currentlythe algorithm allows this 50000 iterations, the accuracy reaches up to12 digits. Limit the number of iterations to a reasonable value loosesaccuracy in that cases. The normal amount of iterations is below 50.
That sounds reasonable enough for normal usage, doesn't it? How likely
would the condition "x near alpha/(alpha+beta)" occur in real data?

I don't know, but it is a usual question, to calculate the value atmean. It is surely more realistic than values near 1, which will besimple treated as 1.

I tried to shift up and down - like I_x(alpha+1,b) -, but then theaccuracy decreases. The problem gets worse when alpha is large and betasmall, which gives a mean near 1 and the problem (2) hits in addition.
If someone knows a solution that gives more accuracy with lessiterations, please let me know. I failed with the algorithm BASYM fromDidonato likely because of the needed erfc function.

It seems, that I can solve the erfc problem :) I need to learn about"static methods" (see other thread).

So again MSVC seems to be lacking a C99 mathematical function. Given an
erfc() function the BASYM algorithm would do fine?

I have implemented BASYM directly into ScTTT now. The accuracy is over10 digits, sometimes up to 15 digits, same as the continued fraction. Itcalculates with less then 1000 iteration. So the accuracy is about thesame as continued fraction, but with less iterations.

I have to figure out for which domain it can be used savely, becauseoutside, for example for alpha<1, it gives wrong results. Didonato usesit for 0.97*p<x and 100<alpha<beta.

In a test as BASIC macro I got only 6 digit accuracy.


That doesn't seem to be sufficient if other algorithms would give much
better accuracy.

I have used implementation tricks, which are mentioned in the article,and use the beta-function directly, which is not possible in Basic. InC++ the accuracy is better know.

There will be a new book [1] about the numeric of special functions endof July, and I hope to find some solutions there. But till I get thebook via public library, and read it, and test algorithms, it will be tolate to get a solution into OOo3.1.
What to do? Setting a lower limit?
We could do with a lower limit for OOo3.1 if indeed the 50000 iterations
are hit too many times and turn out to be a bottleneck, and once you
read the book ;) improve on the algorithm for OOo3.2

Using the function in a spreadsheet, I notice no performance problemswith that limit of iterations. But a real performance test would benecessary to find an acceptable limit.

I'll try to finish it in a way, that BASYM is used where needed.Unfortunately summer holidays are over and my vocation ends.

Implement some shifting, which willdecrease the iterations in many cases, but give less accuracy?
I don't think this would be a good solution, given what you mentioned
above about "a mean near 1 and the problem (2) hits in addition".
Return the reached values although they are not as accurate as others,or set a "no convergence" error?
Is there a runtime measurement for "not as accurate as others" that
could be used to determine a "no convergence" case opposed to "this
might still do"?

No, I see none. The current implementation uses the continued fractionsolution, and most of its inaccuracy doesn't came from a wrong algorithmbut from a stop criterion, which breaks much to early. (Besides tryingto solve the accuracy problems in the special ranges, my contribution isthe density function.)


I mean, some people already freak out if the 9th

significant digit isn't correct, others don't even care about the 6th
digit.

:)

(4)
Which values of alpha and beta should be supported? The larger they arethe smaller is the range in which the result goes from near 0 to near 1.So one machine number for x would cover a large range of "correct"results. I have no experience in using BETADIST in real life, but Idoubt that something like alpha=20000 is really needed.
I actually have no idea :-(  Maybe someone on the list could help out
with some expertise?

The spec should give a recommendation. MuPad, the CAS I have got athome, overflows for large parameters. Of cause high end programs likeMathematica have no problems there (You can use it online for singlefunction calls).

(5)
The spec says that the "Cumulative" parameter has type "logical". Inwhich type is it pushed to the stack? How shall I get it from the stack?
Use ScInterpreter::GetBool() if the parameter is present, in this case
if ((nParamCount = GetByte()) >= 6)

OK.

No problems, but ToDo's:
(6)
Adapt the algorithms to the solution concerning expm1 and log1p.


Added a dependency to i91602.


Without them it is impossible to get the desired accuracy.

I know it is a very long mail and you need not to answer. I'll attachthe version with BASYM to the issue, but it will take some time until itis ready. I'll ask if I stuck.


kind regards
Regina

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sc-dev] development status of BETADIST

Reply via email to