RE: [math] proposed ordering for task list, scope of initial release

2003-06-11 Thread Tim O'Brien
On Tue, 2003-06-10 at 23:26, Brent Worden wrote:
  There are several approaches to design a concept for exceptions,
  all of which have pros and cons. I personally would suggest to
  avoid returning NaNs and throwing RuntimeExceptions whereever
  possible and use a package specific hierarchy of declared exceptions
  instead.
 
  J.Pietschmann
 
 I would agree whole-heartedly.

Returning Double.NaN for situations where it makes sense is a settled
issue which has been addressed about three weeks ago please see previous
discussions on the issue through Eyebrowse.

Tim  




 
 Brent Worden
 http://www.brent.worden.org
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [math] proposed ordering for task list, scope of initial release

2003-06-11 Thread Tim O'Brien
On Wed, 2003-06-11 at 00:15, Brent Worden wrote:
 Here's a saying I've used in the past when debating colleagues: Just
 because someone else does something, that doesn't make it right. :)

Please see the previous discussions on the issue, use the Eyebrowse
archive to read the relevant IEEE standards, also in the commons math
developers guide see the two PS files conerning floating-point
arithmetic.

For more advanced algorithms a checked exception makes sense, for
something like Min(), Max() returning NaN makes good sense.  Please read
the material in question and submit patches accrodingly.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-11 Thread J.Pietschmann
Al Chou wrote:
So I pulled out Herr Pietschmann's Brent method class and tested it, and it
threw an exception telling me, Possibly multiple zeros in interval or ill
conditioned function.
Caused by an incomplete and much too naive implementation.
I have now a real implementation of Brent (Brent-Dekker)
ready and could try to submit a patch over the weekend.
 - It's easy to outsmart yourself and create code that's too finicky for
non-numericist users.
Non-numericists (or whatever) tend to underestimate the
traps in numericals calculation because the vast majority
of the problems behave well with modern algorithms most
of the time. Unfortunately, unforseen misbehaviour tends
to come up at the worst possible time, often with the user
barely noticing that something was wrong.
In particular for root finding:
- The function for which a zero is sought could be implemented
  badly, with excessive round-off error and/or bit-cancellation,
  like naive evaluation of dense high order polynominals.
  This may significantly displace the zero point, and it often
  leads to multiple numerical roots where only one was
  analytically expected.
- The function may be inherently or numerically ill conditioned,
  like x*sin(1/x) near zero or ((x-1)^1000)*x^50 for a 50 bit
  mantissa.
- It's hard to know in advance when to trade the performance
  for robustnesss.
  A criterium for root finders is how often the function is
  evaluated, and it is generally assumed this is a expensive
  compared to any calculation the solver could make.
  This can make a difference between bisection, which gives a bit
  per evaluation and needs ~53 iterations for an improvement of
  10E-16 in accuracy, whether the function is well behaved or not,
  and Newton, which ideally doubles the correct bits per evaluation
  and needs ~5 iterations (evaluating of *two* functions) for a
  10E-16 improvement.
  Obviously, if accuracy matters and function evaluation is slow,
  fast algorithms are hard to avoid but precisely defining the
  necessary accuracy and telling what is slow can be time
  consuming and hair-rising.
- Detailed knowledge about the function (and other aspects of the
  problem) beats all kind of clever guesses by sophisitcated solving
  engines all the time. Most algorithms are only really robust if
  you can provide a bracket for the zero. For general functions,
  this is as hard or harder than nailing down the root itself.
  If you know the function has a smooth second derivative and
  no zero in the first derivative in a certain interval (like x1)
  just use newton, if necessary with a numerical derivative, or
  the secant method without bracketing and you'll get your root,
  if it exists.
J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-11 Thread J.Pietschmann
Phil Steitz wrote:
That's where I started, but then Tim and others convinced me that it was 
actually better/more convenient for users for us to behave more like 
java.Math and java's own arithmetic functions -- which use NaN all over 
the place.
Uh, oh. That's probably because of IEEE 854 does so. Returning
NaNs as well as throwing RuntimeExceptions is appropriate if
checking for problems would unnecessarily clutter the whole
program code, in particular if the exceptional conditions can
potentially occure often in a small amount of source code while
in reality occuring rerely. I mean, You certainly don't want to
declare an ArrayOutOfBoundsException just because you want to
make an array access, in particular if the index has already
been checked elswhere for other reasons.
Keep also in mind that NaNs had been invended before high level
languages generally aquired reasonable mechanisms for handling
exceptions, and that this means the hardware is designed to deal
with NaNs rather than throwing exceptions. Java probably adopted
NaNs mainly because checking every FP operation for a NaN would
have been an utter performance killer.
The question is: can the user be expected to provide more often
valid input to commons-math methods than not? If so, will checking
for a math exception clutter the user's routines too much?
 Also, from a usage standpoint, if we use checked exceptions 
everywhere, this is a bit inconvenient for users.  We need to find the 
right balance.
Exactly.

It is, however, common for libraries to use checked exceptions.

J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Al Chou
--- Al Chou [EMAIL PROTECTED] wrote:
 --- Phil Steitz [EMAIL PROTECTED] wrote:
 [deletia]
  OK, long-winded disclaimer aside, here is how I see the task list ordered:
[deletia]
  * Framework and implementation strategie(s) for finding roots or
 real-valued
  functions of one (real) variable.  Here again -- largely done.  I would
  prefer
  to wait until J gets back and let him submit his framework and R. Brent's
  algorithm.  Then our Brent's implementation and usage can be integrated
  (actually not much to do, from the looks of the current code) and I will
 add
  my bean equations stuff (in progress).
 
 I may have time to submit my Ridders' method implementation using J.'s
 framework before he returns 2 days hence.  Should I bother to try, or should
 I
 wait until he submits his code as a patch via Bugzilla?

Well, I've just spent some time over the past 3 days reminding myself pf some
of the things that are so hard about numerics.

I was testing my Ridders' method implementation and couldn't understand why it
took so many iterations to converge and still not be within the requested
accuracy of the known root I asked it to find.  I used a simple quintic
(x+1)(x+0.5)(x)(x-0.5)(x-1) as the function whose roots I want to find, and I
made sure to give upper and lower bounds that I know bracket one and only one
root.  When trying to find the roots at x = +- 0.5 my solver had no trouble
(though I didn't ask it how many of the 100 iterations it was allowed that it
actually used, until later), but the root at x = 0 was never within even a
factor of 15 of the requested 1e-6 accuracy even when allowed to take up to 200
iterations (actually, I used this test case first, which was what prompted me
to try the larger-valued roots in case I was seeing some loss of precision or
roundoff error effect).

BTW, in the process of using Herr Pietschmann's root finder framework, I
discovered a bug in setMaximalIterationCount (it sets
defaultMaximalIterationCount instead of maximalIterationCount).

I then decided to try Brent W.'s bisection solver, which converged to the
desired root to within its requested accuracy (1e-9) in 26 or 27 iterations
even for the root at x = 0.  At this point I asked my Ridders' method how many
iterations it took to find x = 0.5, and it said 1, and I realized that was
probably because my bracket values were symmetric (or close enough) about the
root, so its midpoint evaluation of the function found the root by coincidence.
 When I made sure the bracket values weren't symmetric about that root, I was
back to 146 iterations or more and not getting to within the requested accuracy
of the root location.

So I pulled out Herr Pietschmann's Brent method class and tested it, and it
threw an exception telling me, Possibly multiple zeros in interval or ill
conditioned function.

The morals of the story are:
 - More-sophisticated algorithms that are supposed to converge faster don't
always do so
 - It's easy to outsmart yourself and create code that's too finicky for
non-numericist users.

As someone said recently on the list, a typical user probably is more
interested in an algorithm that's guaranteed to converge to a root (if there is
one) than in the rate of convergence, as long as it's not too ridiculously
slow.  Given that we've repeatedly determined that commons-math is not to be a
general numerical mathematics library, I think now that we should provide only
a bisection method in the initial release (assuming we achieve one) and spend
time later making our implementations of the more sophisticated algorithms more
user-friendly, if we find they're even needed.  I believe we've let ourselves
go down the path of as-yet-unjustified optimization in our designs, because we
know of algorithms that are supposed to be better.  I also have a greater,
first-hand, appreciation of the subtleties in NR's code to make it more robust
for the user, and I believe we can only achieve that level of robustness if we
take enough time -- which we should not prior to the initial release, because
that will be too much time.

Finally, having used the Pietschmann root finder framework, I think it needs
some modification to make it more user-friendly.  As a lay user, I would have
been much happier dealing with Brent W.'s interface than Herr Pietschmann's,
which was kind of cumbersome.  I think, though, with a little slimming down, it
would be quite workable.


Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Al Chou
--- Brent Worden [EMAIL PROTECTED] wrote:
  -Original Message-
  From: Phil Steitz [mailto:[EMAIL PROTECTED]
  Sent: Friday, June 06, 2003 12:21 PM
[deletia]
  * Exponential growth and decay (set up for financial
  applications) I think this
  is just going to be a matter of finding the right formulas to add
  to MathUtils.
   I don't want to get carried away with financial computations,
  but some simple,
  commonly used formulas would be a nice addition to the package.
  We should also
  be thinking about other things to add to MathUtils -- religiously
  adhering to
  th guiding principles, of course.  Al's sign() is an excellent
  example of the
  kind of thing that we should be adding, IMHO.
 
 Things that might be added:
 Average of two numbers comes up a lot.

Do we muddy the class hierarchy by putting such a thing into MathUtils rather
than the stat subtree?


 Something similar to JUnit's assertEquals(double expected, double actual,
 double epsilon).

Is JUnit's license (http://www.opensource.org/licenses/ibmpl.php) Apache
compatible?


 Simple methods like isPositive, isNegative, etc. can be used to make boolean
 expressions more human readable.

I'm willing to build those two on top of sign (I'm so generous with my coding
time, eh? g).  Are those two sufficient?  sign treats 0 as positive, which
may not be desirable.


 Some other constants besides E and PI: golden ratio, euler, sqrt(PI), etc.

That would be nice, though we should consider which ones are really needed
generally.  I personally love the lore of constants, of which there are more
than you might imagine (see
http://mathworld.wolfram.com/topics/Constants.html).


 I've used a default error constant several places.  It would be nice to come
 up with a central location for such values.

Or at least define a consistent interface that could be implemented by whatever
needs that.



Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Phil Steitz
Brent Worden wrote:
-Original Message-
From: Phil Steitz [mailto:[EMAIL PROTECTED]
Sent: Friday, June 06, 2003 12:21 PM
To: [EMAIL PROTECTED]
Subject: [math] proposed ordering for task list, scope of initial
release
Here is a *proposed* ordering for the task list, with a little commentary
added.
One thing that I want to make *very* clear up front, is that I
*never* intended
the task list or the items listed in the scope section of the
proposal to be
definitive.  All that is definitive are the guiding principles,
which just try
to keep us focused on stuff that people will find both useful and
easy to use.
I expected that the actual contents of the first release would
include some
things not on the list and would exclude some of the things
there.  At this
stage, as Jouzas pointed out, it is more important for us to
build community
than to rush a release out the door. So if there are things that fit the
guidelines that others would like to contribute, but which are
not on the list,
*please* suggest them.  Also, for those who may not have dug into
the code, but
who may be interested in contributing, please rest assured that deep
mathematical knowledge is not required to help. We can review
implementations
and deal with mathematical problems as they arise, using our
small but growing
community as a resource.  The same is obviously true on the the
Java/OS tools
side -- no need to be an expert to contribute.
OK, long-winded disclaimer aside, here is how I see the task list ordered:

* The RealMatrixImpl class is missing some key method implementations. The
critical thing is solution of linear systems. We need to implement a
numerically sound solution algorithm. This will enable inverse() and also
support general linear regression. -- I think that Brent is
working on this.


The only thing I've done is the Cholesky decomposition.  I haven't done
anything for the general linear system case.
Are you going to do this, or should I take it on?

* t-test statistic needs to be added and we should probably add
the capability
of actually performing t- and chi-square tests at fixed
significance levels
(.1, .05, .01, .001). -- This is virtually done, just need to
define a nice,
convenient interface for doing one- and two-tailed tests.  Thanks
to Brent, we
can actually support user-supplied significance levels (next item)


Anyone have any thoughts on the interface?  I was thinking of an Inference
interface that supports the conducting of one- and two-tailed tests as well
as constructing their complementary confidence intervals.  Or, if we want to
separate concerns create both a HypothesisTest and a ConfidenceInterval
interface, one for each type of inference.  Either way, I would use the
tried-and-true abstract factory way of creating inference instances.
Comments are welcome.

* numerical approximation of the t- and chi-square distributions to enable
user-supplied significance levels.  See above.  Someone just
needs to put a
fork in this. Tim? Brent?


Done.

Including the testing interface?  See below.


* *new* add support for F distribution and F test, so that we can report
signinficance level of correlation coefficient in bivariate regression /
signinficance of model.  I will do this if no one else wants to.


Done.  I'll probably knock out a few more easy continuous distributions to
get them out of the way.

* Framework and implementation strategie(s) for finding roots or
real-valued
functions of one (real) variable.  Here again -- largely done.  I
would prefer
to wait until J gets back and let him submit his framework and R. Brent's
algorithm.  Then our Brent's implementation and usage can be integrated
(actually not much to do, from the looks of the current code) and
I will add my
bean equations stuff (in progress).


Sounds good.


* Extend distribution framework to support discrete distributions
and implement
binomial and hypergeometric distributions.  I will do this if no
one else wants
to.  If someone else does it, you should make sure to use the log
binomials in
computations.


Binomial can easily be obtained using the regularized beta function that is
already defined.  Hypergeometric will be a little more work as I don't think
there's a compact formula to compute the cpf.
Using the log binomials, direct computation of the density might not be 
too bad.  I have not researched this, but that is what I was thinking.

  One thing to note, since the
discrete distributions do not have nice invertible mappings for critical
values to probabilities like those found for continuous distributions, how
should the inverseCummulativeProbability method work?  For a given
probability, p, should the method return one value, x, such that x is the
largest value where P(X = x) = p?  Or the smallest value, x, where P(X =
x) = p.  Or should the method return two values, x0 and x1, such that P(X
= x0) = p = P(X = x1)?
I think in the discrete case, we should supply the density function (and 
the cumulative probability 

Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Phil Steitz
Al Chou wrote:
--- Al Chou [EMAIL PROTECTED] wrote:

--- Phil Steitz [EMAIL PROTECTED] wrote:
[deletia]
OK, long-winded disclaimer aside, here is how I see the task list ordered:

[deletia]

* Framework and implementation strategie(s) for finding roots or
real-valued

functions of one (real) variable.  Here again -- largely done.  I would
prefer
to wait until J gets back and let him submit his framework and R. Brent's
algorithm.  Then our Brent's implementation and usage can be integrated
(actually not much to do, from the looks of the current code) and I will
add

my bean equations stuff (in progress).
I may have time to submit my Ridders' method implementation using J.'s
framework before he returns 2 days hence.  Should I bother to try, or should
I
wait until he submits his code as a patch via Bugzilla?


Well, I've just spent some time over the past 3 days reminding myself pf some
of the things that are so hard about numerics.

BTW, in the process of using Herr Pietschmann's root finder framework, I
discovered a bug in setMaximalIterationCount (it sets
defaultMaximalIterationCount instead of maximalIterationCount).
So I pulled out Herr Pietschmann's Brent method class and tested it, and it
threw an exception telling me, Possibly multiple zeros in interval or ill
conditioned function.
The morals of the story are:
 - More-sophisticated algorithms that are supposed to converge faster don't
always do so
 - It's easy to outsmart yourself and create code that's too finicky for
non-numericist users.
Good thing to keep reminding ourselves.

As someone said recently on the list, a typical user probably is more
interested in an algorithm that's guaranteed to converge to a root (if there is
one) than in the rate of convergence, as long as it's not too ridiculously
slow.  Given that we've repeatedly determined that commons-math is not to be a
general numerical mathematics library, I think now that we should provide only
a bisection method in the initial release (assuming we achieve one) and spend
time later making our implementations of the more sophisticated algorithms more
user-friendly, if we find they're even needed.  
+1, but maybe adding Secant method (I think J included this as well, if 
memory serves).


Finally, having used the Pietschmann root finder framework, I think it needs
some modification to make it more user-friendly.  As a lay user, I would have
been much happier dealing with Brent W.'s interface than Herr Pietschmann's,
which was kind of cumbersome.  I think, though, with a little slimming down, it
would be quite workable.
We should let J comment on this.  Also, the bean equations stuff that 
I am working on will be *very* easy to use (though less sophisticated).

Al

=
Albert Davidson Chou
Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Phil Steitz
Brent Worden wrote:
-Original Message-

* t-test statistic needs to be added and we should probably add
the capability
of actually performing t- and chi-square tests at fixed
significance levels
(.1, .05, .01, .001). -- This is virtually done, just need to
define a nice,
convenient interface for doing one- and two-tailed tests.  Thanks
to Brent, we
can actually support user-supplied significance levels (next item)


Anyone have any thoughts on the interface?  I was thinking of an Inference
interface that supports the conducting of one- and two-tailed tests as well
as constructing their complementary confidence intervals.  Or, if we want to
separate concerns create both a HypothesisTest and a ConfidenceInterval
interface, one for each type of inference.  Either way, I would use the
tried-and-true abstract factory way of creating inference instances.
Comments are welcome.
I have been thinking about this.  If I can stop sending emails for long 
enought to pull the patch together, I am about to submit a patch to 
BivariateRegression that adds the slope confidence interval computation 
and significance level, based on the new t-distribution impl (thanks, 
Brent!).  I thought about a generic ConfidenceInterval interface, but 
then thought that it would be more convenient for users to just return 
the halfwidth in double getSlopeConfidenceInterval(). To support the 
goal of testing model significance, I also added getSignificance().

I think the concrete stuff is easier to use and all we need at present. 
 Something like:

boolean twoTailedTTest(Univariate, Univariate,signif) or even
boolean twoTailedTTest(double[],double[],signif)
(obviously adding one-tailed tests and tests against constants as well 
and tests that return doubles representing minimal p-values, possibly 
called significance)
boolean chiSquareTest(expected, observed, signif)
boolean chiSquareTest(Freq, Freq, signif)

To add the abstractions above meaningfully, we need to convince 
ourselves that either a) multiple implementation strategies might exist 
--  For parametric tests, this is not the case -- or b) the abstractions 
will make development of inferential components easier/more manageable. 
I am not sure about b). In fact, when I think about it I think that 
there is not much left when you abstract things to a high enough level 
to represent hypothesis testing and/or confidence intervals generically. 
I remember math stat students having a hard time understanding the 
abstract definitions of these concepts. I don't think that it is a good 
idea to force our users to think about these things.  Therefore, I would 
recommend sticking with concrete implementations defined close to the 
statistical applications.

Keep the user application use cases in mind.  If I want to determine 
whether the diffence in two means is significant, I should be able to do 
that quickly and intuitively, with one method call either using 
Univariates or double[]s.



* numerical approximation of the t- and chi-square distributions to enable
user-supplied significance levels.  See above.  Someone just
needs to put a
fork in this. Tim? Brent?


Done.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread J.Pietschmann
Al Chou wrote:
I may have time to submit my Ridders' method implementation using J.'s
framework before he returns 2 days hence.  Should I bother to try, or should I
wait until he submits his code as a patch via Bugzilla?
I'm a bit short on spare time anyway.

J.Pietschmann

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread J.Pietschmann
Phil Steitz wrote:
My philosophy on this is that whatever exceptions we define should be 
close to the components that throw them -- e.g. ConvergenceException. 
 I do not like the idea of a generic MathException.  As much as 
possible, I think that we should rely on the built-ins (including the 
extensions recently added to lang). Regarding ConvergenceException, I am 
on the fence for inclusion in the initial release, though I see 
something like this as eventually inevitable.  Correct me if I am wrong, 
but the only place that this is used now is in the dist package and we 
could either just throw a RuntimeException directly there or return NaN. 
 I do see the semantic value of ConvergenceException, however.
There are several approaches to design a concept for exceptions,
all of which have pros and cons. I personally would suggest to
avoid returning NaNs and throwing RuntimeExceptions whereever
possible and use a package specific hierarchy of declared exceptions
instead.
J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Al Chou
--- Phil Steitz [EMAIL PROTECTED] wrote:
 Al Chou wrote:
  --- Phil Steitz [EMAIL PROTECTED] wrote:
  
 Brent Worden wrote:
 
 -Original Message-
 From: Phil Steitz [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 06, 2003 12:21 PM
 To: [EMAIL PROTECTED]
 Subject: [math] proposed ordering for task list, scope of initial
 release
 
  [deletia]
  
 Things that might be added:
 Average of two numbers comes up a lot.
 
 Yes. Some (of us) might not like the organization of this; but I have a 
 couple of times posted the suggestion that we add several
 double[]-double functions to MathUtils representing the core 
 computations for univariate -- mean, min, max, variance, sum, sumsq. 
 This would be convenient for users and us as well.  I guess I would not 
 be averse to moving these to stat.StatUtils, maybe just adding ave(x,y) 
 to MathUtils.
 
 Given the post that I just saw regarding financial computations, I 
 suggest that we let MathUtils grow a bit (including the double[]-double 
 functions and then think about breaking it apart prior to release.  As 
 long as we stick to simple static methods, that will not be hard to do.
  
  
  Would it be considered poor form to provide these methods in MathUtils but
 have
  them delegate to the stat subtree of the class hierarchy.  That way all the
  actual code would be in one place, but we wouldn't force users to know that
  they're doing a statistical calculation when they just want average(x, y).
  
  
 I actually was thinking the other way around.  If you feel strongly 
 about keeping these things in stat, we can create StatUtils.  The point 
 is to encapsulate these basic functions so that a) users can get them 
 immediately without thinking about our stat abstractions and b) we can 
 get the storage-based computations of the basic quantities in one place. 
   When the UnivariateImpl window is finite, it should use the same 
 computations that AbstractStoreUnivariate does -- this is why we need to 
 encapsulate.

My organizational instincts say to put the implementation in stat and delegate
to it from MathUtils.  Probably 99% of actual use will consist of code calling
MathUtils (because no one will bother to learn that the implementation is
really in stat), but until we see a performance problem I'm strongly for
categorizing things as what they are (what they are in my mind, of course g).
 Avoiding premature optimization and YAGNI, and so on


 Some other constants besides E and PI: golden ratio, euler, sqrt(PI), etc.
 I've used a default error constant several places.
 
It would be nice to come
 
 up with a central location for such values.
 
 I get the first 3, but what exactly do you mean by the default error 
 constant?
  
  
  I read that to mean the accuracy requested (aka allowable error) of a given
  algorithm invocation.
  
 
 But why would we ever want to define that as a constant?

I wouldn't, at least not as a global constant.  That's why I suggested we
define an interface that can be implemented by the classes that need this
functionality.  That way we'll have a consistent way to set the value for each
class that needs it.  Currently, Brent's bisection method hardcodes it, whereas
Herr Pietschmann's framework provides a getter/setter pair in an interface.  I
wonder if it's even possible to abstract further and pull the accuracy aspect
into a separate interface.  Accuracy/error _seems_ like a general concept, but
it could be too fuzzy a concept to yield a concrete interface specification.


 In addition to the above, has any thought gone into a set of application
 exceptions that will be thrown.  Are we going to rely on Java core
 exceptions or are we going to create some application specific exceptions?
 As I recall J uses a MathException in the solver routines and I added a
 ConvergenceException.  Should we expand that list or fold it into one
 generic application exception or do away with application exceptions all
 together?
 
 My philosophy on this is that whatever exceptions we define should be 
 close to the components that throw them -- e.g. ConvergenceException. 
   I do not like the idea of a generic MathException.  As much as 
 possible, I think that we should rely on the built-ins (including the 
 extensions recently added to lang). Regarding ConvergenceException, I am 
 on the fence for inclusion in the initial release, though I see 
 something like this as eventually inevitable.  Correct me if I am wrong, 
 but the only place that this is used now is in the dist package and we 
 could either just throw a RuntimeException directly there or return NaN. 
   I do see the semantic value of ConvergenceException, however.  I guess 
 I would vote for keeping it.
  
  
  I agree that we should have exceptions be as specific as possible. 
  MathException could be an abstract parent for all of the commons-math
 exception
  classes, maybe.
  
 
 I do not see the need for an abstract hierarchy of math exceptions at 
 this time.  Of 

Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread J.Pietschmann
Al Chou wrote:
Finally, having used the Pietschmann root finder framework, I think it needs
some modification to make it more user-friendly.  As a lay user, I would have
been much happier dealing with Brent W.'s interface than Herr Pietschmann's,
which was kind of cumbersome.  I think, though, with a little slimming down, it
would be quite workable.
I'm interested in hearung a few more details: what makes the
framework cumbersome? Admittedly I didn't have time yet to
look at Brent's framework.
J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Al Chou
--- J.Pietschmann [EMAIL PROTECTED] wrote:
 Al Chou wrote:
  I may have time to submit my Ridders' method implementation using J.'s
  framework before he returns 2 days hence.  Should I bother to try, or
 should I
  wait until he submits his code as a patch via Bugzilla?
 
 I'm a bit short on spare time anyway.

OK, I'll submit on your behalf.


Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Al Chou
--- J.Pietschmann [EMAIL PROTECTED] wrote:
 Al Chou wrote:
  Finally, having used the Pietschmann root finder framework, I think it
 needs
  some modification to make it more user-friendly.  As a lay user, I would
 have
  been much happier dealing with Brent W.'s interface than Herr
 Pietschmann's,
  which was kind of cumbersome.  I think, though, with a little slimming
 down, it
  would be quite workable.
 
 I'm interested in hearung a few more details: what makes the
 framework cumbersome? Admittedly I didn't have time yet to
 look at Brent's framework.

Having to instantiate an instance of the solver class seemed unnecessary. 
Brent's approach was to make the solver class' constructor private so that you
simply call

RootFinding.bisection( f, a, b )

rather than do

RootFinding rootFinder = new RootFinding() ;
double root = rootFinder.bisection( f, a, b )  ;


That's a pretty easy change to make, although it prohibits the case of having
two solvers simultaneously with different accuracy requirements or suchlike. 
You'd have to

RootFinding.setAccuracy() ;

between calls to different function/solver bound pairs, but I don't see our
users needing to solve two equations with different accuracy requirements
anytime soon.



Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread O'brien, Tim
On Tue, 2003-06-10 at 14:23, Phil Steitz wrote:
 Al Chou wrote:
 I actually was thinking the other way around.  If you feel strongly 
 about keeping these things in stat, we can create StatUtils.  The point 
 is to encapsulate these basic functions so that a) users can get them 
 immediately without thinking about our stat abstractions and b) we can 
 get the storage-based computations of the basic quantities in one place. 

+1

   When the UnivariateImpl window is finite, it should use the same 
 computations that AbstractStoreUnivariate does -- this is why we need to 
 encapsulate.

+1

I agree with both of these ideas.  I think that putting everything in
MathUtil might become unwieldy - no problem with creating a StatUtil. 
(If that hasn't already been done, I'm checking my email for the first
time in days)

Tim





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)

2003-06-10 Thread Phil Steitz

--- Mark R. Diggory [EMAIL PROTECTED] wrote:
 
 
 Al Chou wrote:
  --- Phil Steitz [EMAIL PROTECTED] wrote:
  
 
 
 Simple methods like isPositive, isNegative, etc. can be used to make
 boolean expressions more human readable. I'm willing to build those two 
 on top of sign (I'm so generous with my
 coding time, eh? g).  Are those two sufficient?  sign treats 0 as
 positive,
 which may not be desirable.
 
 +1 (especially the part about your time :-)
 
 
 OK, I'll TDD those up, hopefully resolving the question of what to do
 about the sign of 0 in the process.
 
 
 Forgot to weigh in on this.  I would say that 0 is neither positive nor 
 negative.  If that is not a happy state, I would prefer to call 
 isPositive, isNonNegative.  I know that is ugly, I have a hard time 
 calling 0 a positive number.  So, my first should would be isPositive 
 and isNegative both fail for zero, second would be to rename as above.
  
  
  I tend to agree with you, except for the usage that I wrote sign() for in
 the
  first place.  Granted, that may be an unusual usage, so I'll keep your
 remarks
  in mind while I TDD.  Also, I just realized that I won't be submitting the
  Ridders' method code for the initial release anyway (at least as far as I
  know), so maybe sign() needs to change, given that it has no users that
 require
  the current behavior.
  
  
  Al
 
 
 [-1]
 
 Um, I'm not too clear on this one, how is calling 
 MathUtils.isPositive(d) clearer than (d = 0)?
 
 I think the argument over implementation above is a clear enough reason 
 as to why something like this shouldn't be created. There is a standard 
 logic to evaluations in java that is elegant and mathematical in nature. 
 I'd fear we would just be reinventing the wheel here.
 
 I included Al's functions because they were a little more complex than 
 that, they provided different return type when dealing with different 
 evaluations. Of course these could be captured inline quite easily as 
 well with examples like:
 
 d = 0 ? 1d : -1d
 d  0 ? 1d : -1d
 ...
 
 So again, I'm not sure how strong a benefit they provide in the long 
 run. I personally would probably exclude them on the basis that they are 
 overly simplified in comparison to what is already in MathUtils 
 (factorial and binomialCoefficient). It seems we should stick to 
 functionality that extends Math capabilities and not create a the new 
 wheel of alternative math functionality already present in java, the 
 sign() methods borderline this case of functionality and
 
 boolean isPositive(double d)
 
 definitely reinvents the wheel in a very big way. I think in general its 
 best to keep static functions in MathUtil's that simplify complex 
 calculations like factorials.

Simple things are also good.  I like sign or sgn.  This is basic and missing
from java.  You have a good point, however re isPositive(), isNegative().  It's
really a matter of taste, what makes more readable code.

 
  Would it be considered poor form to provide these methods in MathUtils 
  but have
  them delegate to the stat subtree of the class hierarchy.  That way 
  all the
  actual code would be in one place, but we wouldn't force users to know 
  that
  they're doing a statistical calculation when they just want average(x, 
  y).
 
 
  I actually was thinking the other way around.  If you feel strongly 
  about keeping these things in stat, we can create StatUtils.  The point 
  is to encapsulate these basic functions so that a) users can get them 
  immediately without thinking about our stat abstractions and b) we can 
  get the storage-based computations of the basic quantities in one place. 
   When the UnivariateImpl window is finite, it should use the same 
  computations that AbstractStoreUnivariate does -- this is why we need to 
  encapsulate.
 
 I feel the need to wave a caution flag here. Using MathUtils as a ground 
 for exposing quick access to default functions is an interesting idea. 
   But I think it creates an Interface situation that over-complicates 
 the library, having multiple ways to do something tends to create 
 confusion. I would recommend we focus more for solidifying the 
 implementations and then consider simple static access to certain 
 functionality in the future after we have solid implementations in 
 place. And, I also suggest we base this on user response/need and not on 
 our initial expectations, if users like it and want it, we can add it.
 

I disagree. We need it ourselves, unless we want to duplicate code between
UnivariateImpl and AbstractStoreUnivariate.  Also, I personally and I am sure
many other users would like simple array-based functions for means, sums, etc.
If I have an array of doubles and I all I want to is compute its mean, I would
like to be able to do that directly, rather than having to instantiate a stat
object.


 I say this because I believe other developers will become confused as to 
 whether to use the static or OO (Object Oriented) way to use the 
 

Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)

2003-06-10 Thread Al Chou
--- Mark R. Diggory [EMAIL PROTECTED] wrote:
 I included Al's functions because they were a little more complex than 
 that, they provided different return type when dealing with different 
 evaluations. Of course these could be captured inline quite easily as 
 well with examples like:
 
 d = 0 ? 1d : -1d
 d  0 ? 1d : -1d
 ...

I also want to point out that it's syntactically a little nicer to write

a * sign(b) * c

than

a * ( b  0 ? 1.0 : -1.0 ) * c


 boolean isPositive(double d)
 
 definitely reinvents the wheel in a very big way. I think in general its 
 best to keep static functions in MathUtil's that simplify complex 
 calculations like factorials.

That's an interesting point.  I wasn't super-keen on isPositive/isNegative, and
I confess I was tempted by the opportunity to reuse sign().  I'll hold off
further development for now.


  Would it be considered poor form to provide these methods in MathUtils 
  but have
  them delegate to the stat subtree of the class hierarchy.  That way 
  all the
  actual code would be in one place, but we wouldn't force users to know 
  that
  they're doing a statistical calculation when they just want average(x, 
  y).
 
 
  I actually was thinking the other way around.  If you feel strongly 
  about keeping these things in stat, we can create StatUtils.  The point 
  is to encapsulate these basic functions so that a) users can get them 
  immediately without thinking about our stat abstractions and b) we can 
  get the storage-based computations of the basic quantities in one place. 
   When the UnivariateImpl window is finite, it should use the same 
  computations that AbstractStoreUnivariate does -- this is why we need to 
  encapsulate.
 
 I feel the need to wave a caution flag here. Using MathUtils as a ground 
 for exposing quick access to default functions is an interesting idea. 
   But I think it creates an Interface situation that over-complicates 
 the library, having multiple ways to do something tends to create 
 confusion. I would recommend we focus more for solidifying the 
 implementations and then consider simple static access to certain 
 functionality in the future after we have solid implementations in 
 place. And, I also suggest we base this on user response/need and not on 
 our initial expectations, if users like it and want it, we can add it.
 
 I say this because I believe other developers will become confused as to 
 whether to use the static or OO (Object Oriented) way to use the 
 functionality when developing. If we have two different strategies for 
 accessing functionality, then we need to have design rules on how where 
 to use each case in our own development.

Interesting point as well.  Not having encountered Java code that does this
kind of double-exposure of functionality, I'm not sure how I feel about it.  In
Ruby it doesn't seem to be a problem, but then I haven't worked on large
projects in that language, so again I may not have the experience to back up
any opinions.  I have seen this kind of dual interface in Perl modules (e.g.,
in CGI.pm), and there it seems to serve a useful purpose in providing syntactic
flexibility, although admittedly the performance of the static/procedural vs.
OO interfaces is disclaimed not to be identical.



Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)

2003-06-10 Thread Al Chou
--- O'brien, Tim [EMAIL PROTECTED] wrote:
 On Tue, 2003-06-10 at 16:26, Mark R. Diggory wrote:
  [-1]
  
  Um, I'm not too clear on this one, how is calling 
  MathUtils.isPositive(d) clearer than (d = 0)?
 
 [+0], Mark, if I follow the discussion correctly, the concept isn't
 trying to ascertain if a given number is greater than or equal to zero. 
 I believe that the discussion revolved around the mathematical concept
 of Positive.  Is a given number positive is a different question
 from is a given number greater than or equal to zero - depending on your
 specific definition and needs.
 
 An application that needs to test for a Non-negative numbers, would
 benefit from a isNonNegative method.  Even though, the function simply
 contains d = 0.  MathUtils.isNonNegative( 3 ) is conceptually different
 from 3 = 0.  Personally, I would choose, 3 = 0, but if a programmer
 wished to invoke that operation via MathUtils.isNonNegative to attain a
 sort of conceptual purity, I don't think this is our decision to make.
 
  I included Al's functions because they were a little more complex than 
  that, they provided different return type when dealing with different 
  evaluations. Of course these could be captured inline quite easily as 
  well with examples like:
  
  d = 0 ? 1d : -1d
  d  0 ? 1d : -1d
 
 I'm not sure why that function would not return a boolean primitive,
 anyone have any good reasons not to?

I needed a function that returned a number so I could multiply by it.


  definitely reinvents the wheel in a very big way. I think in general its 
  best to keep static functions in MathUtil's that simplify complex 
  calculations like factorials.
 
 Again, I can see someone wanting these functions if one wants to be
 absolutely sure that they are complying with strict conceptual
 definitions in a very large system.  I don't personally have a need for
 isPositive, but that isn't to say that Al hasn't found a good reason to
 use them in the past.  
 
 Al?  what was the motivation here?

Wasn't my idea in the first place, I think it was Brent's.



Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)

2003-06-10 Thread Phil Steitz
Mark R. Diggory wrote:


Phil Steitz wrote:

--- Mark R. Diggory [EMAIL PROTECTED] wrote:


I disagree. We need it ourselves, unless we want to duplicate code 
between
UnivariateImpl and AbstractStoreUnivariate.  Also, I personally and I 
am sure
many other users would like simple array-based functions for means, 
sums, etc.
If I have an array of doubles and I all I want to is compute its mean, 
I would
like to be able to do that directly, rather than having to instantiate 
a stat
object.

If there is a strong motivation for it, then it should go in before 
release. But, I'd really rather have the static functions be static 
delegates for the Implementations, not the other way around. (this 
thought is defended later in this message).
We need it now, to improve the computations in UnivariateImpl for the 
finite window case. I guess I am going to have to do this, since no one 
else seems interested.

In terms of duplicate code in Univar and StorUnivar, its not obvious to 
me what the static interface of MathUtils or StatUtils has to do with 
this? My feelings are that UnivariateImpl should delegate to 
StoredUnivariateImpl in situations where storage is required.

MathUtils (or StatUtils) provides a low overhead, natural place to 
encapsulate the core computation, similar to java.Math. To have the 
UnivariateImpls delegate like this is not a good design, IMHO.  Think 
about what that would require in terms of instantiation, dependencies, 
etc.  It is a *much better* idea to encapsulate the common (very basic, 
btw) functionality, especially given that it is generically useful.  We 
will run in to *lots* of scenarios where we want to sum an array or find 
the min of an array.  It is silly to force all of these things to depend 
on and force instantiation of Univariates.



I say this because I believe other developers will become confused as 
to whether to use the static or OO (Object Oriented) way to use the 
functionality when developing. 


I disagree.  We should provide the flexibility to choose.  
Computationally
intensive applications may want to work directly with arrays (as we 
should
internally), while others will more naturally work with stat objects, 
or beans.

[defense] I agree, and I think in the case of Univariate's (and other 
applications) that it would be best to supply methods for working with 
arrays, you should be able to hand Univar a double[] without having to 
iterate over it and add each value using addValue(...). There should be 
a method or constructor that uses such a double array directly for the 
calculation. Again, this means that MathUtil's is just a static 
delegation point for such methods across different classes, those 
classes have to implement the methods that would get called to support 
such functionality.

I am suggesting to have such methods in MathUtil's, but keep the 
implementations in the classes themselves.

That is backwards an inefficient, IMHO.  That would defeat the main 
purpose, which is to provide lighteweight, efficient, cleanly 
encapsulated computational methods that the stat (and other) objects can 
use.

If we have two different strategies for

accessing functionality, then we need to have design rules on how 
where to use each case in our own development.


I agree.  This is why I proposed adding the static double[] - double
computational methods -- so the many places where we will need them 
can all use
common, optimized implementations.


If I were writing a class that used other implementations in [math], I 
would use the implementations directly as much as possible and avoid 
usage via the static interface. I'd do this simply to support optimized 
object usage over constantly reintantiating the objects that may get 
recreated ever time such a static method is called. (Some others may 
disagree, I'm sure theres lots of room for opinion here).
The point is to provide the users with a choice.  For some things, a 
Univariate is natural, for simple computations on arrays, it is overkill 
, IMHO.  For some situations, the BeanListUnivariate is natural.  There 
is no reason to limit things artifically or to resort to unnatural and 
inefficient implementation strategies when it is easy to expose the 
functionality.  Suppose that Math did not support sqrt().  Would we add 
this to some Univariate implementation and build spaghetti dependencies 
on that?  I don't think so.  This kind of thing fits naturally in a 
MathUtils class.  Similarly, the simple computational function sum: 
double[] |- double belongs naturally in a StatUtils class.  Have a look 
at the *Utils classes in lang. These are among the most useful things in 
the package.

Phil

Cheers,
Mark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional 

RE: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Brent Worden
 -Original Message-
 From: Al Chou [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 10, 2003 2:14 PM
 To: Jakarta Commons Developers List
 Subject: Re: [math] proposed ordering for task list, scope of initial
 release


 --- Phil Steitz [EMAIL PROTECTED] wrote:
  Brent Worden wrote:
   I've used a default error constant several places.
 It would be nice to come
   up with a central location for such values.
 
  I get the first 3, but what exactly do you mean by the default error
  constant?

 I read that to mean the accuracy requested (aka allowable error)
 of a given
 algorithm invocation.


That's right.  Certain routines perform their iterative computations until a
desired accuracy is achieved.  If the user doesn't explicitly state this
accuracy, what should it be?  A default error/accuracy constant would answer
that and provide uniform level of accuracy throughout the library.

Brent Worden
http://www.brent.worden.org


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Brent Worden


 -Original Message-
 From: J.Pietschmann [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 10, 2003 3:04 PM
 To: Jakarta Commons Developers List
 Subject: Re: [math] proposed ordering for task list, scope of initial
 release


 Phil Steitz wrote:
  My philosophy on this is that whatever exceptions we define should be
  close to the components that throw them -- e.g. ConvergenceException.
   I do not like the idea of a generic MathException.  As much as
  possible, I think that we should rely on the built-ins (including the
  extensions recently added to lang). Regarding
 ConvergenceException, I am
  on the fence for inclusion in the initial release, though I see
  something like this as eventually inevitable.  Correct me if I
 am wrong,
  but the only place that this is used now is in the dist package and we
  could either just throw a RuntimeException directly there or
 return NaN.
   I do see the semantic value of ConvergenceException, however.

 There are several approaches to design a concept for exceptions,
 all of which have pros and cons. I personally would suggest to
 avoid returning NaNs and throwing RuntimeExceptions whereever
 possible and use a package specific hierarchy of declared exceptions
 instead.

 J.Pietschmann

I would agree whole-heartedly.

Brent Worden
http://www.brent.worden.org


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Phil Steitz
Brent Worden wrote:
-Original Message-
From: Al Chou [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 10, 2003 2:14 PM
To: Jakarta Commons Developers List
Subject: Re: [math] proposed ordering for task list, scope of initial
release
--- Phil Steitz [EMAIL PROTECTED] wrote:

Brent Worden wrote:

I've used a default error constant several places.
  It would be nice to come

up with a central location for such values.
I get the first 3, but what exactly do you mean by the default error
constant?
I read that to mean the accuracy requested (aka allowable error)
of a given
algorithm invocation.


That's right.  Certain routines perform their iterative computations until a
desired accuracy is achieved.  If the user doesn't explicitly state this
accuracy, what should it be?  A default error/accuracy constant would answer
that and provide uniform level of accuracy throughout the library.
Now I get it.  But I am not comfortable with the scope. I could see this 
defined for RootFinding or Distributions, etc, but not in general.  In 
general, the constant would have no meaning (to me, at least).  I would 
prefer to let individual implementations define their own defaults 
(specified in the javadoc of course) and allow users to override.  A 
single default max iterations for both rootfinding and, e.g. numerical 
integration, makes no sense.  Better to have the defaults scoped at the 
algorithm/implementation level.

Brent Worden
http://www.brent.worden.org
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Phil Steitz
Brent Worden wrote:

-Original Message-
From: J.Pietschmann [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 10, 2003 3:04 PM
To: Jakarta Commons Developers List
Subject: Re: [math] proposed ordering for task list, scope of initial
release
Phil Steitz wrote:

My philosophy on this is that whatever exceptions we define should be
close to the components that throw them -- e.g. ConvergenceException.
I do not like the idea of a generic MathException.  As much as
possible, I think that we should rely on the built-ins (including the
extensions recently added to lang). Regarding
ConvergenceException, I am

on the fence for inclusion in the initial release, though I see
something like this as eventually inevitable.  Correct me if I
am wrong,

but the only place that this is used now is in the dist package and we
could either just throw a RuntimeException directly there or
return NaN.

I do see the semantic value of ConvergenceException, however.
There are several approaches to design a concept for exceptions,
all of which have pros and cons. I personally would suggest to
avoid returning NaNs and throwing RuntimeExceptions whereever
possible and use a package specific hierarchy of declared exceptions
instead.
J.Pietschmann


I would agree whole-heartedly.

That's where I started, but then Tim and others convinced me that it was 
actually better/more convenient for users for us to behave more like 
java.Math and java's own arithmetic functions -- which use NaN all over 
the place.  Also, from a usage standpoint, if we use checked exceptions 
everywhere, this is a bit inconvenient for users.  We need to find the 
right balance.

I am one the fence on this whole issue.  I am interested in hearing more 
about what others may have in mind.

Phil

Brent Worden
http://www.brent.worden.org
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Brent Worden


 -Original Message-
 From: J.Pietschmann [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 10, 2003 3:06 PM
 To: Jakarta Commons Developers List
 Subject: Re: [math] proposed ordering for task list, scope of initial
 release


 Al Chou wrote:
  Finally, having used the Pietschmann root finder framework, I
 think it needs
  some modification to make it more user-friendly.  As a lay
 user, I would have
  been much happier dealing with Brent W.'s interface than Herr
 Pietschmann's,
  which was kind of cumbersome.  I think, though, with a little
 slimming down, it
  would be quite workable.

 I'm interested in hearung a few more details: what makes the
 framework cumbersome? Admittedly I didn't have time yet to
 look at Brent's framework.

 J.Pietschmann


For clarification, I never meant for the bisection method to be the end-all
for root finding.  I just needed something to facilitate the distribution
implementations.  I would prefer using J's object approach to the static
method any day, if for no reason then because of the inflexibility of static
methods.  They can't be overriden, they can't hold on to any state (a nice
feature in J's work), they can't be subclassed, ...

That being said, any design can be approved on (sorry J, even yours), but
the flavor of the object approach is, IMO, more agreeable than the static
method approach.  It also is inline with the direction most of the library
is beginning to take; complex algorithms encapsulated in strategy type
objects which are interchangeable through a common interface.

Brent Worden
http://www.brent.worden.org


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Phil Steitz
Brent Worden wrote:

-Original Message-
From: J.Pietschmann [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 10, 2003 3:06 PM
To: Jakarta Commons Developers List
Subject: Re: [math] proposed ordering for task list, scope of initial
release
Al Chou wrote:

Finally, having used the Pietschmann root finder framework, I
think it needs

some modification to make it more user-friendly.  As a lay
user, I would have

been much happier dealing with Brent W.'s interface than Herr
Pietschmann's,

which was kind of cumbersome.  I think, though, with a little
slimming down, it

would be quite workable.
I'm interested in hearung a few more details: what makes the
framework cumbersome? Admittedly I didn't have time yet to
look at Brent's framework.
J.Pietschmann



For clarification, I never meant for the bisection method to be the end-all
for root finding.  I just needed something to facilitate the distribution
implementations.  
Works like a champ ;-)  I am having fun with these. I am thinking about 
publishing some critical value tables with the apache liscense. he he.

I would prefer using J's object approach to the static
method any day, if for no reason then because of the inflexibility of static
methods.  They can't be overriden, they can't hold on to any state (a nice
feature in J's work), they can't be subclassed, ..
This is an important point.  Despite my recent advocacy for a small set 
of static util methods, I strongly agree that we should never 
implement complex algorithms in static methods and we should in general 
  avoid statics for the reasons that you give above.
.
That being said, any design can be approved on (sorry J, even yours), but
the flavor of the object approach is, IMO, more agreeable than the static
method approach.  It also is inline with the direction most of the library
is beginning to take; complex algorithms encapsulated in strategy type
objects which are interchangeable through a common interface.
I agree.  It would be nice to get J's framework in and refactor your 
Dist stuff to use it.  I would be OK with just including Bisection and 
Secant as initial implementations.  Other implementations could be added 
by us or users later.

Phil

Brent Worden
http://www.brent.worden.org
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: [math] proposed ordering for task list, scope of initial release

2003-06-10 Thread Brent Worden
 -Original Message-
 From: Phil Steitz [mailto:[EMAIL PROTECTED]
 
 There are several approaches to design a concept for exceptions,
 all of which have pros and cons. I personally would suggest to
 avoid returning NaNs and throwing RuntimeExceptions whereever
 possible and use a package specific hierarchy of declared exceptions
 instead.
 
 J.Pietschmann
 
 
  I would agree whole-heartedly.
 

 That's where I started, but then Tim and others convinced me that it was
 actually better/more convenient for users for us to behave more like
 java.Math and java's own arithmetic functions -- which use NaN all over
 the place.

Here's a saying I've used in the past when debating colleagues: Just
because someone else does something, that doesn't make it right. :)

Also, from a usage standpoint, if we use checked exceptions
 everywhere, this is a bit inconvenient for users.  We need to find the
 right balance.

 I am one the fence on this whole issue.  I am interested in hearing more
 about what others may have in mind.

The big problem I have with returning NaN is the caller has little knowledge
why NaN is being returned.  If an exception is thrown, preferably a
specialized exception like ConvergenceException, the caller knows precisely
the reason for failure and can take appropriate recovery action.


 Phil

Brent Worden
http://www.brent.worden.org


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [math] proposed ordering for task list, scope of initial release

2003-06-09 Thread Brent Worden
 -Original Message-
 From: Phil Steitz [mailto:[EMAIL PROTECTED]
 Sent: Friday, June 06, 2003 12:21 PM
 To: [EMAIL PROTECTED]
 Subject: [math] proposed ordering for task list, scope of initial
 release


 Here is a *proposed* ordering for the task list, with a little commentary
 added.

 One thing that I want to make *very* clear up front, is that I
 *never* intended
 the task list or the items listed in the scope section of the
 proposal to be
 definitive.  All that is definitive are the guiding principles,
 which just try
 to keep us focused on stuff that people will find both useful and
 easy to use.
 I expected that the actual contents of the first release would
 include some
 things not on the list and would exclude some of the things
 there.  At this
 stage, as Jouzas pointed out, it is more important for us to
 build community
 than to rush a release out the door. So if there are things that fit the
 guidelines that others would like to contribute, but which are
 not on the list,
 *please* suggest them.  Also, for those who may not have dug into
 the code, but
 who may be interested in contributing, please rest assured that deep
 mathematical knowledge is not required to help. We can review
 implementations
 and deal with mathematical problems as they arise, using our
 small but growing
 community as a resource.  The same is obviously true on the the
 Java/OS tools
 side -- no need to be an expert to contribute.

 OK, long-winded disclaimer aside, here is how I see the task list ordered:

 * The RealMatrixImpl class is missing some key method implementations. The
 critical thing is solution of linear systems. We need to implement a
 numerically sound solution algorithm. This will enable inverse() and also
 support general linear regression. -- I think that Brent is
 working on this.

The only thing I've done is the Cholesky decomposition.  I haven't done
anything for the general linear system case.

 * t-test statistic needs to be added and we should probably add
 the capability
 of actually performing t- and chi-square tests at fixed
 significance levels
 (.1, .05, .01, .001). -- This is virtually done, just need to
 define a nice,
 convenient interface for doing one- and two-tailed tests.  Thanks
 to Brent, we
 can actually support user-supplied significance levels (next item)

Anyone have any thoughts on the interface?  I was thinking of an Inference
interface that supports the conducting of one- and two-tailed tests as well
as constructing their complementary confidence intervals.  Or, if we want to
separate concerns create both a HypothesisTest and a ConfidenceInterval
interface, one for each type of inference.  Either way, I would use the
tried-and-true abstract factory way of creating inference instances.
Comments are welcome.


 * numerical approximation of the t- and chi-square distributions to enable
 user-supplied significance levels.  See above.  Someone just
 needs to put a
 fork in this. Tim? Brent?

Done.


 * *new* add support for F distribution and F test, so that we can report
 signinficance level of correlation coefficient in bivariate regression /
 signinficance of model.  I will do this if no one else wants to.

Done.  I'll probably knock out a few more easy continuous distributions to
get them out of the way.


 * Framework and implementation strategie(s) for finding roots or
 real-valued
 functions of one (real) variable.  Here again -- largely done.  I
 would prefer
 to wait until J gets back and let him submit his framework and R. Brent's
 algorithm.  Then our Brent's implementation and usage can be integrated
 (actually not much to do, from the looks of the current code) and
 I will add my
 bean equations stuff (in progress).

Sounds good.


 * Extend distribution framework to support discrete distributions
 and implement
 binomial and hypergeometric distributions.  I will do this if no
 one else wants
 to.  If someone else does it, you should make sure to use the log
 binomials in
 computations.

Binomial can easily be obtained using the regularized beta function that is
already defined.  Hypergeometric will be a little more work as I don't think
there's a compact formula to compute the cpf.  One thing to note, since the
discrete distributions do not have nice invertible mappings for critical
values to probabilities like those found for continuous distributions, how
should the inverseCummulativeProbability method work?  For a given
probability, p, should the method return one value, x, such that x is the
largest value where P(X = x) = p?  Or the smallest value, x, where P(X =
x) = p.  Or should the method return two values, x0 and x1, such that P(X
= x0) = p = P(X = x1)?


 * Exponential growth and decay (set up for financial
 applications) I think this
 is just going to be a matter of finding the right formulas to add
 to MathUtils.
  I don't want to get carried away with financial computations,
 but some simple,
 commonly used formulas would be a nice addition 

Re: [math] proposed ordering for task list, scope of initial release

2003-06-08 Thread Al Chou
--- Phil Steitz [EMAIL PROTECTED] wrote:
 * Improve numerical accuracy of Univariate and BivariateRegression
 statistical
 computations. Encapsulate basic double[] |- double mean, variance, min,
 max
 computations using improved formulas and add these to MathUtils. (probably
 should add float[], int[], long[] versions as well.) Then refactor all
 univariate implementations that use stored values (including UnivariateImpl
 with finite window) to use the improved versions. -- Mark?  I am chasing
 down
 the TAS reference to document the source of the _NR_ formula, which I will
 add
 to the docs if someone else does the implementation.
  
  
  I was starting to code the updating (storage-less) variance formula, based
 on
  the Stanford article you cited, as a patch.  I believe the storage-using
  corrected two-pass algorithm is pretty trivial to code once we feel we're
 on
  solid ground with the reference to cite.
  
  
 OK. I finally got hold of the American Statistician article (had to 
 resort to the old trundle down to local university library method) and 

Great!  Thanks.


 found lots of good stuff in it -- including a reference to Hanson's 
 recursive formula (from Stanford paper) and some empirical and 
 theoretical results confirming that NR 14.1.8 is about the best that you 
 can do for the stored case.  There is a refinement mentioned in which 
 pairwise summation is used (essentially splitting the sample in two 
 and computing the recursive sums in parallel); but the value of this 

I was wondering what the pairwise method was, and whether it was another name
for a technique we'd already discussed.  Sounds sort of like Shell's sort or
other recursive divide-and-occur algorithms.


 only kicks in for large n.  I propose that we use NR 14.1.8 as is for 
 all stored computations.  Here is good text for the reference:

 Based on the icorrected two-pass algorithm/i for computing the 
 sample variance, as described in Algorithms for Computing the Sample 
 Variance: Analysis and Recommendations,Tony F Chan, Gene H. Golub and 
 Randall J. LeVeque, iThe American Statitistician/i, 1983, Vol 37, 
 No. 3. (Eq. (1.7) on page 243.)
 
 The empirical investigation that the authors do uses the following trick 
 that I have thought about using to investigate the precision in our 
 stuff:  implement an algorithm using both floats and doubles and use the 
 double computations to assess stability of the algorithm implemented 
 using floats. Might want to play with this a little.

Yes, I skimmed part of the Stanford article and noticed that test technique. 
It's interesting, and as you say, we may want to experiment with it to see what
it can tell us.



Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [math] proposed ordering for task list, scope of initial release

2003-06-07 Thread Al Chou
--- Phil Steitz [EMAIL PROTECTED] wrote:
[deletia]
 OK, long-winded disclaimer aside, here is how I see the task list ordered:
 
 * The RealMatrixImpl class is missing some key method implementations. The
 critical thing is solution of linear systems. We need to implement a
 numerically sound solution algorithm. This will enable inverse() and also
 support general linear regression. -- I think that Brent is working on this. 
  
 
 * Improve numerical accuracy of Univariate and BivariateRegression
 statistical
 computations. Encapsulate basic double[] |- double mean, variance, min, max
 computations using improved formulas and add these to MathUtils. (probably
 should add float[], int[], long[] versions as well.) Then refactor all
 univariate implementations that use stored values (including UnivariateImpl
 with finite window) to use the improved versions. -- Mark?  I am chasing down
 the TAS reference to document the source of the _NR_ formula, which I will
 add
 to the docs if someone else does the implementation.

I was starting to code the updating (storage-less) variance formula, based on
the Stanford article you cited, as a patch.  I believe the storage-using
corrected two-pass algorithm is pretty trivial to code once we feel we're on
solid ground with the reference to cite.


 * Define full package structure and develop user's guide following the
 package
 structure.  I have started work on the user's guide, but found this
 impossible
 without the package structure defined.  I will post a separate message
 summarizing what has been proposed up to now and making a recommendation.
 
 * t-test statistic needs to be added and we should probably add the
 capability
 of actually performing t- and chi-square tests at fixed significance levels
 (.1, .05, .01, .001). -- This is virtually done, just need to define a nice,
 convenient interface for doing one- and two-tailed tests.  Thanks to Brent,
 we
 can actually support user-supplied significance levels (next item)
 
 * numerical approximation of the t- and chi-square distributions to enable
 user-supplied significance levels.  See above.  Someone just needs to put a
 fork in this. Tim? Brent?
 
 * *new* add support for F distribution and F test, so that we can report
 signinficance level of correlation coefficient in bivariate regression /
 signinficance of model.  I will do this if no one else wants to.
 
 * Framework and implementation strategie(s) for finding roots or real-valued
 functions of one (real) variable.  Here again -- largely done.  I would
 prefer
 to wait until J gets back and let him submit his framework and R. Brent's
 algorithm.  Then our Brent's implementation and usage can be integrated
 (actually not much to do, from the looks of the current code) and I will add
 my bean equations stuff (in progress).

I may have time to submit my Ridders' method implementation using J.'s
framework before he returns 2 days hence.  Should I bother to try, or should I
wait until he submits his code as a patch via Bugzilla?


 * Extend distribution framework to support discrete distributions and
 implement
 binomial and hypergeometric distributions.  I will do this if no one else
 wants
 to.  If someone else does it, you should make sure to use the log binomials
 in
 computations.
 
 * Exponential growth and decay (set up for financial applications) I think
 this
 is just going to be a matter of finding the right formulas to add to
 MathUtils.
  I don't want to get carried away with financial computations, but some
 simple,
 commonly used formulas would be a nice addition to the package. We should
 also
 be thinking about other things to add to MathUtils -- religiously adhering to
 th guiding principles, of course.  Al's sign() is an excellent example of the
 kind of thing that we should be adding, IMHO.

Thanks for the compliment!  I think I finally understand what you mean with the
exponential stuff:  compount interest calculation, for the most part, with
continuous compounding requiring the exponential.


 * Polynomial Interpolation -- let Al tell us what to do here.  Even better,
 let Al do it (he he).   

I actually did some research last night (I told myself I was going to bed
early, hah) on rational function interpolation, trying to find a primary source
for the algorithm rather than again rely on a secondary source in the form of
NR.  I guess I'll continue along this path, as I really want a clean room
implementation of it for my own use.  I'd feel better using rational functions
rather than polynomials for their generally larger radius of convergence.



Al

=
Albert Davidson Chou

Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL 

Re: [math] proposed ordering for task list, scope of initial release

2003-06-07 Thread Phil Steitz
Al Chou wrote:


[deletia]**2

* Improve numerical accuracy of Univariate and BivariateRegression
statistical
computations. Encapsulate basic double[] |- double mean, variance, min, max
computations using improved formulas and add these to MathUtils. (probably
should add float[], int[], long[] versions as well.) Then refactor all
univariate implementations that use stored values (including UnivariateImpl
with finite window) to use the improved versions. -- Mark?  I am chasing down
the TAS reference to document the source of the _NR_ formula, which I will
add
to the docs if someone else does the implementation.


I was starting to code the updating (storage-less) variance formula, based on
the Stanford article you cited, as a patch.  I believe the storage-using
corrected two-pass algorithm is pretty trivial to code once we feel we're on
solid ground with the reference to cite.  
Yes.  I just wanted to propose the refactoring.

* Framework and implementation strategie(s) for finding roots or real-valued
functions of one (real) variable.  Here again -- largely done.  I would
prefer
to wait until J gets back and let him submit his framework and R. Brent's
algorithm.  Then our Brent's implementation and usage can be integrated
(actually not much to do, from the looks of the current code) and I will add
my bean equations stuff (in progress).


I may have time to submit my Ridders' method implementation using J.'s
framework before he returns 2 days hence.  Should I bother to try, or should I
wait until he submits his code as a patch via Bugzilla?
I doubt that J would mind if someone else were to submit the framework 
(including his @author of course) from his post to the list.  You could 
combine his classes and yours into one patch and submit it if you have 
time to do this before he gets back.


* Polynomial Interpolation -- let Al tell us what to do here.  Even better,
let Al do it (he he).   


I actually did some research last night (I told myself I was going to bed
early, hah) on rational function interpolation, trying to find a primary source
for the algorithm rather than again rely on a secondary source in the form of
NR.  I guess I'll continue along this path, as I really want a clean room
implementation of it for my own use.  I'd feel better using rational functions
rather than polynomials for their generally larger radius of convergence.
Thanks for looking into this.  If you think rational functions are 
better, go for it.  One more thing to think about is splines. A natural 
spline implementation might be easier to document/understand from users' 
perspective. We might want to eventually support both (and maybe even 
polynomial interpolation).

Phil



Al

=
Albert Davidson Chou
Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [math] proposed ordering for task list, scope of initial release

2003-06-07 Thread Phil Steitz

* Improve numerical accuracy of Univariate and BivariateRegression
statistical
computations. Encapsulate basic double[] |- double mean, variance, min, max
computations using improved formulas and add these to MathUtils. (probably
should add float[], int[], long[] versions as well.) Then refactor all
univariate implementations that use stored values (including UnivariateImpl
with finite window) to use the improved versions. -- Mark?  I am chasing down
the TAS reference to document the source of the _NR_ formula, which I will
add
to the docs if someone else does the implementation.


I was starting to code the updating (storage-less) variance formula, based on
the Stanford article you cited, as a patch.  I believe the storage-using
corrected two-pass algorithm is pretty trivial to code once we feel we're on
solid ground with the reference to cite.

OK. I finally got hold of the American Statistician article (had to 
resort to the old trundle down to local university library method) and 
found lots of good stuff in it -- including a reference to Hanson's 
recursive formula (from Stanford paper) and some empirical and 
theoretical results confirming that NR 14.1.8 is about the best that you 
can do for the stored case.  There is a refinement mentioned in which 
pairwise summation is used (essentially splitting the sample in two 
and computing the recursive sums in parallel); but the value of this 
only kicks in for large n.  I propose that we use NR 14.1.8 as is for 
all stored computations.  Here is good text for the reference:

Based on the icorrected two-pass algorithm/i for computing the 
sample variance, as described in Algorithms for Computing the Sample 
Variance: Analysis and Recommendations,Tony F Chan, Gene H. Golub and 
Randall J. LeVeque, iThe American Statitistician/i, 1983, Vol 37, 
No. 3. (Eq. (1.7) on page 243.)

The empirical investigation that the authors do uses the following trick 
that I have thought about using to investigate the precision in our 
stuff:  implement an algorithm using both floats and doubles and use the 
double computations to assess stability of the algorithm implemented 
using floats. Might want to play with this a little.

Phil





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]