Eike Rathke schrieb:
Hi Regina,
I postponed this reply far too many times.. sorry.
Too many questions at once ;-)
It seems to be one of the most difficult functions. To get an idea of
that function, have a look at the file toms708.c of the R-project.
On Saturday, 2008-07-19 00:35:29 +0200, Regina Henschel wrote:
(2)
Nearly all terms inside have parts like (1-x)^r. When the x argument is
close to 1, approximately x > 0.9999, then the term (1-x)^r has large
cancellation errors. I know no way to avoid it. Switching to power
series 1+x^2+x^3+... is no solution, because it nearly do not converge
for x near 1.
That leads to the question: Which accuracy should the function have in
which part of the domain? My suggestion would be, in case x > 0.9999 not
to try to get more accuracy but document the loss of accuracy in the
application help.
Do you think this is a general problem for this particular function, or
is it related to the algorithm used? Would there be not overly
sophisticated algorithms that eliminated the accuracy loss?
Please take a spreadsheet and enter x1=0.999999+6.1E-016 and
x2=0.999999-4.9E-016 and then calculate 1-x1 an 1-x2. That is the range,
which is possible for an input, that gives 0.999999 in the 15-digit
format, Calc uses. So I think in the meantime, that it cannot be solved
with a program that is bound to double. The beta distribution is
affected by cancellation errors more than other functions, because in
its definition you have "x" as well as "1-x".
If I understand the Gnumeric code correct, they use long double values
where possible. But MSVC Express doesn't have long double, but maps it
to double. I would like to compare Calc with a Gnumeric, which is
compiled without support of long double.
If not, and
it is a general problem, we could state that in the ODFF spec as well.
Stating it in the application help of course would be good anyway.
I think it is a general problem, but I'm no expert in number formats.
(3)
For x near alpha/(alpha+beta), which is mean p, the loops need huge
amount of iterations. I cound more than 50000 in some cases. Currently
the algorithm allows this 50000 iterations, the accuracy reaches up to
12 digits. Limit the number of iterations to a reasonable value looses
accuracy in that cases. The normal amount of iterations is below 50.
That sounds reasonable enough for normal usage, doesn't it? How likely
would the condition "x near alpha/(alpha+beta)" occur in real data?
I don't know, but it is a usual question, to calculate the value at
mean. It is surely more realistic than values near 1, which will be
simple treated as 1.
I tried to shift up and down - like I_x(alpha+1,b) -, but then the
accuracy decreases. The problem gets worse when alpha is large and beta
small, which gives a mean near 1 and the problem (2) hits in addition.
If someone knows a solution that gives more accuracy with less
iterations, please let me know. I failed with the algorithm BASYM from
Didonato likely because of the needed erfc function.
It seems, that I can solve the erfc problem :) I need to learn about
"static methods" (see other thread).
So again MSVC seems to be lacking a C99 mathematical function. Given an
erfc() function the BASYM algorithm would do fine?
I have implemented BASYM directly into ScTTT now. The accuracy is over
10 digits, sometimes up to 15 digits, same as the continued fraction. It
calculates with less then 1000 iteration. So the accuracy is about the
same as continued fraction, but with less iterations.
I have to figure out for which domain it can be used savely, because
outside, for example for alpha<1, it gives wrong results. Didonato uses
it for 0.97*p<x and 100<alpha<beta.
In a test as BASIC macro I got only 6 digit accuracy.
That doesn't seem to be sufficient if other algorithms would give much
better accuracy.
I have used implementation tricks, which are mentioned in the article,
and use the beta-function directly, which is not possible in Basic. In
C++ the accuracy is better know.
There will be a new book [1] about the numeric of special functions end
of July, and I hope to find some solutions there. But till I get the
book via public library, and read it, and test algorithms, it will be to
late to get a solution into OOo3.1.
What to do? Setting a lower limit?
We could do with a lower limit for OOo3.1 if indeed the 50000 iterations
are hit too many times and turn out to be a bottleneck, and once you
read the book ;) improve on the algorithm for OOo3.2
Using the function in a spreadsheet, I notice no performance problems
with that limit of iterations. But a real performance test would be
necessary to find an acceptable limit.
I'll try to finish it in a way, that BASYM is used where needed.
Unfortunately summer holidays are over and my vocation ends.
Implement some shifting, which will
decrease the iterations in many cases, but give less accuracy?
I don't think this would be a good solution, given what you mentioned
above about "a mean near 1 and the problem (2) hits in addition".
Return the reached values although they are not as accurate as others,
or set a "no convergence" error?
Is there a runtime measurement for "not as accurate as others" that
could be used to determine a "no convergence" case opposed to "this
might still do"?
No, I see none. The current implementation uses the continued fraction
solution, and most of its inaccuracy doesn't came from a wrong algorithm
but from a stop criterion, which breaks much to early. (Besides trying
to solve the accuracy problems in the special ranges, my contribution is
the density function.)
I mean, some people already freak out if the 9th
significant digit isn't correct, others don't even care about the 6th
digit.
:)
(4)
Which values of alpha and beta should be supported? The larger they are
the smaller is the range in which the result goes from near 0 to near 1.
So one machine number for x would cover a large range of "correct"
results. I have no experience in using BETADIST in real life, but I
doubt that something like alpha=20000 is really needed.
I actually have no idea :-( Maybe someone on the list could help out
with some expertise?
The spec should give a recommendation. MuPad, the CAS I have got at
home, overflows for large parameters. Of cause high end programs like
Mathematica have no problems there (You can use it online for single
function calls).
(5)
The spec says that the "Cumulative" parameter has type "logical". In
which type is it pushed to the stack? How shall I get it from the stack?
Use ScInterpreter::GetBool() if the parameter is present, in this case
if ((nParamCount = GetByte()) >= 6)
OK.
No problems, but ToDo's:
(6)
Adapt the algorithms to the solution concerning expm1 and log1p.
Added a dependency to i91602.
Without them it is impossible to get the desired accuracy.
I know it is a very long mail and you need not to answer. I'll attach
the version with BASYM to the issue, but it will take some time until it
is ready. I'll ask if I stuck.
kind regards
Regina
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]