Re: [math] proposed ordering for task list, scope of initial release
Phil Steitz wrote: That's where I started, but then Tim and others convinced me that it was actually better/more convenient for users for us to behave more like java.Math and java's own arithmetic functions -- which use NaN all over the place. Uh, oh. That's probably because of IEEE 854 does so. Returning NaNs as well as throwing RuntimeExceptions is appropriate if checking for problems would unnecessarily clutter the whole program code, in particular if the exceptional conditions can potentially occure often in a small amount of source code while in reality occuring rerely. I mean, You certainly don't want to declare an ArrayOutOfBoundsException just because you want to make an array access, in particular if the index has already been checked elswhere for other reasons. Keep also in mind that NaNs had been invended before high level languages generally aquired reasonable mechanisms for handling exceptions, and that this means the hardware is designed to deal with NaNs rather than throwing exceptions. Java probably adopted NaNs mainly because checking every FP operation for a NaN would have been an utter performance killer. The question is: can the user be expected to provide more often valid input to commons-math methods than not? If so, will checking for a math exception clutter the user's routines too much? Also, from a usage standpoint, if we use checked exceptions everywhere, this is a bit inconvenient for users. We need to find the right balance. Exactly. It is, however, common for libraries to use checked exceptions. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: So I pulled out Herr Pietschmann's Brent method class and tested it, and it threw an exception telling me, "Possibly multiple zeros in interval or ill conditioned function." Caused by an incomplete and much too naive implementation. I have now a real implementation of Brent (Brent-Dekker) ready and could try to submit a patch over the weekend. - It's easy to outsmart yourself and create code that's too finicky for non-numericist users. Non-numericists (or whatever) tend to underestimate the traps in numericals calculation because the vast majority of the problems behave well with modern algorithms most of the time. Unfortunately, unforseen misbehaviour tends to come up at the worst possible time, often with the user barely noticing that something was wrong. In particular for root finding: - The function for which a zero is sought could be implemented badly, with excessive round-off error and/or bit-cancellation, like naive evaluation of dense high order polynominals. This may significantly displace the zero point, and it often leads to multiple numerical roots where only one was analytically expected. - The function may be inherently or numerically ill conditioned, like x*sin(1/x) near zero or ((x-1)^1000)*x^50 for a 50 bit mantissa. - It's hard to know in advance when to trade the performance for robustnesss. A criterium for root finders is how often the function is evaluated, and it is generally assumed this is a expensive compared to any calculation the solver could make. This can make a difference between bisection, which gives a bit per evaluation and needs ~53 iterations for an improvement of 10E-16 in accuracy, whether the function is well behaved or not, and Newton, which ideally doubles the correct bits per evaluation and needs ~5 iterations (evaluating of *two* functions) for a 10E-16 improvement. Obviously, if accuracy matters and function evaluation is slow, fast algorithms are hard to avoid but precisely defining the necessary accuracy and telling what is "slow" can be time consuming and hair-rising. - Detailed knowledge about the function (and other aspects of the problem) beats all kind of clever guesses by sophisitcated solving engines all the time. Most algorithms are only really robust if you can provide a bracket for the zero. For general functions, this is as hard or harder than nailing down the root itself. If you know the function has a smooth second derivative and no zero in the first derivative in a certain interval (like x>1) just use newton, if necessary with a numerical derivative, or the secant method without bracketing and you'll get your root, if it exists. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
On Wed, 2003-06-11 at 00:15, Brent Worden wrote: > Here's a saying I've used in the past when debating colleagues: "Just > because someone else does something, that doesn't make it right." :) Please see the previous discussions on the issue, use the Eyebrowse archive to read the relevant IEEE standards, also in the commons math developers guide see the two PS files conerning floating-point arithmetic. For more advanced algorithms a checked exception makes sense, for something like Min(), Max() returning NaN makes good sense. Please read the material in question and submit patches accrodingly. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
On Tue, 2003-06-10 at 23:26, Brent Worden wrote: > > There are several approaches to design a concept for exceptions, > > all of which have pros and cons. I personally would suggest to > > avoid returning NaNs and throwing RuntimeExceptions whereever > > possible and use a package specific hierarchy of declared exceptions > > instead. > > > > J.Pietschmann > > I would agree whole-heartedly. Returning Double.NaN for situations where it makes sense is a settled issue which has been addressed about three weeks ago please see previous discussions on the issue through Eyebrowse. Tim > > Brent Worden > http://www.brent.worden.org > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
> -Original Message- > From: Phil Steitz [mailto:[EMAIL PROTECTED] > >> > >>There are several approaches to design a concept for exceptions, > >>all of which have pros and cons. I personally would suggest to > >>avoid returning NaNs and throwing RuntimeExceptions whereever > >>possible and use a package specific hierarchy of declared exceptions > >>instead. > >> > >>J.Pietschmann > > > > > > I would agree whole-heartedly. > > > > That's where I started, but then Tim and others convinced me that it was > actually better/more convenient for users for us to behave more like > java.Math and java's own arithmetic functions -- which use NaN all over > the place. Here's a saying I've used in the past when debating colleagues: "Just because someone else does something, that doesn't make it right." :) Also, from a usage standpoint, if we use checked exceptions > everywhere, this is a bit inconvenient for users. We need to find the > right balance. > > I am one the fence on this whole issue. I am interested in hearing more > about what others may have in mind. The big problem I have with returning NaN is the caller has little knowledge why NaN is being returned. If an exception is thrown, preferably a specialized exception like ConvergenceException, the caller knows precisely the reason for failure and can take appropriate recovery action. > > Phil Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Brent Worden wrote: -Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 10, 2003 3:06 PM To: Jakarta Commons Developers List Subject: Re: [math] proposed ordering for task list, scope of initial release Al Chou wrote: Finally, having used the Pietschmann root finder framework, I think it needs some modification to make it more user-friendly. As a lay user, I would have been much happier dealing with Brent W.'s interface than Herr Pietschmann's, which was kind of cumbersome. I think, though, with a little slimming down, it would be quite workable. I'm interested in hearung a few more details: what makes the framework cumbersome? Admittedly I didn't have time yet to look at Brent's framework. J.Pietschmann For clarification, I never meant for the bisection method to be the end-all for root finding. I just needed something to facilitate the distribution implementations. Works like a champ ;-) I am having fun with these. I am thinking about publishing some critical value tables with the apache liscense. he he. I would prefer using J's object approach to the static method any day, if for no reason then because of the inflexibility of static methods. They can't be overriden, they can't hold on to any state (a nice feature in J's work), they can't be subclassed, .. This is an important point. Despite my recent advocacy for a small set of static "util" methods, I strongly agree that we should never implement complex algorithms in static methods and we should in general avoid statics for the reasons that you give above. . That being said, any design can be approved on (sorry J, even yours), but the flavor of the object approach is, IMO, more agreeable than the static method approach. It also is inline with the direction most of the library is beginning to take; complex algorithms encapsulated in strategy type objects which are interchangeable through a common interface. I agree. It would be nice to get J's framework in and refactor your Dist stuff to use it. I would be OK with just including Bisection and Secant as initial implementations. Other implementations could be added by us or users later. Phil Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
> -Original Message- > From: J.Pietschmann [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 10, 2003 3:06 PM > To: Jakarta Commons Developers List > Subject: Re: [math] proposed ordering for task list, scope of initial > release > > > Al Chou wrote: > > Finally, having used the Pietschmann root finder framework, I > think it needs > > some modification to make it more user-friendly. As a lay > user, I would have > > been much happier dealing with Brent W.'s interface than Herr > Pietschmann's, > > which was kind of cumbersome. I think, though, with a little > slimming down, it > > would be quite workable. > > I'm interested in hearung a few more details: what makes the > framework cumbersome? Admittedly I didn't have time yet to > look at Brent's framework. > > J.Pietschmann > For clarification, I never meant for the bisection method to be the end-all for root finding. I just needed something to facilitate the distribution implementations. I would prefer using J's object approach to the static method any day, if for no reason then because of the inflexibility of static methods. They can't be overriden, they can't hold on to any state (a nice feature in J's work), they can't be subclassed, ... That being said, any design can be approved on (sorry J, even yours), but the flavor of the object approach is, IMO, more agreeable than the static method approach. It also is inline with the direction most of the library is beginning to take; complex algorithms encapsulated in strategy type objects which are interchangeable through a common interface. Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Brent Worden wrote: -Original Message- From: J.Pietschmann [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 10, 2003 3:04 PM To: Jakarta Commons Developers List Subject: Re: [math] proposed ordering for task list, scope of initial release Phil Steitz wrote: My philosophy on this is that whatever exceptions we define should be "close" to the components that throw them -- e.g. ConvergenceException. I do not like the idea of a generic "MathException." As much as possible, I think that we should rely on the built-ins (including the extensions recently added to lang). Regarding ConvergenceException, I am on the fence for inclusion in the initial release, though I see something like this as eventually inevitable. Correct me if I am wrong, but the only place that this is used now is in the dist package and we could either just throw a RuntimeException directly there or return NaN. I do see the semantic value of ConvergenceException, however. There are several approaches to design a concept for exceptions, all of which have pros and cons. I personally would suggest to avoid returning NaNs and throwing RuntimeExceptions whereever possible and use a package specific hierarchy of declared exceptions instead. J.Pietschmann I would agree whole-heartedly. That's where I started, but then Tim and others convinced me that it was actually better/more convenient for users for us to behave more like java.Math and java's own arithmetic functions -- which use NaN all over the place. Also, from a usage standpoint, if we use checked exceptions everywhere, this is a bit inconvenient for users. We need to find the right balance. I am one the fence on this whole issue. I am interested in hearing more about what others may have in mind. Phil Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Brent Worden wrote: -Original Message- From: Al Chou [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 10, 2003 2:14 PM To: Jakarta Commons Developers List Subject: Re: [math] proposed ordering for task list, scope of initial release --- Phil Steitz <[EMAIL PROTECTED]> wrote: Brent Worden wrote: I've used a default error constant several places. It would be nice to come up with a central location for such values. I get the first 3, but what exactly do you mean by the default error constant? I read that to mean the accuracy requested (aka allowable error) of a given algorithm invocation. That's right. Certain routines perform their iterative computations until a desired accuracy is achieved. If the user doesn't explicitly state this accuracy, what should it be? A default error/accuracy constant would answer that and provide uniform level of accuracy throughout the library. Now I get it. But I am not comfortable with the scope. I could see this defined for RootFinding or Distributions, etc, but not in general. In general, the constant would have no meaning (to me, at least). I would prefer to let individual implementations define their own defaults (specified in the javadoc of course) and allow users to override. A single default "max iterations" for both rootfinding and, e.g. numerical integration, makes no sense. Better to have the defaults scoped at the algorithm/implementation level. Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
> -Original Message- > From: J.Pietschmann [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 10, 2003 3:04 PM > To: Jakarta Commons Developers List > Subject: Re: [math] proposed ordering for task list, scope of initial > release > > > Phil Steitz wrote: > > My philosophy on this is that whatever exceptions we define should be > > "close" to the components that throw them -- e.g. ConvergenceException. > > I do not like the idea of a generic "MathException." As much as > > possible, I think that we should rely on the built-ins (including the > > extensions recently added to lang). Regarding > ConvergenceException, I am > > on the fence for inclusion in the initial release, though I see > > something like this as eventually inevitable. Correct me if I > am wrong, > > but the only place that this is used now is in the dist package and we > > could either just throw a RuntimeException directly there or > return NaN. > > I do see the semantic value of ConvergenceException, however. > > There are several approaches to design a concept for exceptions, > all of which have pros and cons. I personally would suggest to > avoid returning NaNs and throwing RuntimeExceptions whereever > possible and use a package specific hierarchy of declared exceptions > instead. > > J.Pietschmann I would agree whole-heartedly. Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
> -Original Message- > From: Al Chou [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 10, 2003 2:14 PM > To: Jakarta Commons Developers List > Subject: Re: [math] proposed ordering for task list, scope of initial > release > > > --- Phil Steitz <[EMAIL PROTECTED]> wrote: > > Brent Worden wrote: > > > I've used a default error constant several places. > >It would be nice to come > > > up with a central location for such values. > > > > I get the first 3, but what exactly do you mean by the default error > > constant? > > I read that to mean the accuracy requested (aka allowable error) > of a given > algorithm invocation. > That's right. Certain routines perform their iterative computations until a desired accuracy is achieved. If the user doesn't explicitly state this accuracy, what should it be? A default error/accuracy constant would answer that and provide uniform level of accuracy throughout the library. Brent Worden http://www.brent.worden.org - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)
Mark R. Diggory wrote: Phil Steitz wrote: --- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote: I disagree. We need it ourselves, unless we want to duplicate code between UnivariateImpl and AbstractStoreUnivariate. Also, I personally and I am sure many other users would like simple array-based functions for means, sums, etc. If I have an array of doubles and I all I want to is compute its mean, I would like to be able to do that directly, rather than having to instantiate a stat object. If there is a strong motivation for it, then it should go in before release. But, I'd really rather have the static functions be static delegates for the Implementations, not the other way around. (this thought is defended later in this message). We need it now, to improve the computations in UnivariateImpl for the finite window case. I guess I am going to have to do this, since no one else seems interested. In terms of duplicate code in Univar and StorUnivar, its not obvious to me what the static interface of MathUtils or StatUtils has to do with this? My feelings are that UnivariateImpl should delegate to StoredUnivariateImpl in situations where storage is required. MathUtils (or StatUtils) provides a low overhead, natural place to encapsulate the core computation, similar to java.Math. To have the UnivariateImpls delegate like this is not a good design, IMHO. Think about what that would require in terms of instantiation, dependencies, etc. It is a *much better* idea to encapsulate the common (very basic, btw) functionality, especially given that it is generically useful. We will run in to *lots* of scenarios where we want to sum an array or find the min of an array. It is silly to force all of these things to depend on and force instantiation of Univariates. I say this because I believe other developers will become confused as to whether to use the static or OO (Object Oriented) way to use the functionality when developing. I disagree. We should provide the flexibility to choose. Computationally intensive applications may want to work directly with arrays (as we should internally), while others will more naturally work with stat objects, or beans. [defense] I agree, and I think in the case of Univariate's (and other applications) that it would be best to supply methods for working with arrays, you should be able to hand Univar a double[] without having to iterate over it and add each value using addValue(...). There should be a method or constructor that uses such a double array directly for the calculation. Again, this means that MathUtil's is just a static delegation point for such methods across different classes, those classes have to implement the methods that would get called to support such functionality. I am suggesting "to have" such methods in MathUtil's, but keep the implementations in the classes themselves. That is backwards an inefficient, IMHO. That would defeat the main purpose, which is to provide lighteweight, efficient, cleanly encapsulated computational methods that the stat (and other) objects can use. If we have two different strategies for accessing functionality, then we need to have design rules on how where to use each case in our own development. I agree. This is why I proposed adding the static double[] -> double computational methods -- so the many places where we will need them can all use common, optimized implementations. If I were writing a class that used other implementations in [math], I would use the implementations directly as much as possible and avoid usage via the static interface. I'd do this simply to support optimized object usage over constantly reintantiating the objects that may get recreated ever time such a static method is called. (Some others may disagree, I'm sure theres lots of room for opinion here). The point is to provide the users with a choice. For some things, a Univariate is natural, for simple computations on arrays, it is overkill , IMHO. For some situations, the BeanListUnivariate is natural. There is no reason to limit things artifically or to resort to unnatural and inefficient implementation strategies when it is easy to expose the functionality. Suppose that Math did not support sqrt(). Would we add this to some Univariate implementation and build spaghetti dependencies on that? I don't think so. This kind of thing fits naturally in a MathUtils class. Similarly, the simple computational function sum: double[] |-> double belongs naturally in a StatUtils class. Have a look at the *Utils classes in lang. These are among the most useful things in the package. Phil Cheers, Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional
Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)
--- "O'brien, Tim" <[EMAIL PROTECTED]> wrote: > On Tue, 2003-06-10 at 16:26, Mark R. Diggory wrote: > > [-1] > > > > Um, I'm not too clear on this one, how is calling > > MathUtils.isPositive(d) clearer than (d >= 0)? > > [+0], Mark, if I follow the discussion correctly, the concept isn't > trying to ascertain if a given number is greater than or equal to zero. > I believe that the discussion revolved around the mathematical concept > of "Positive". Is a given number "positive" is a different question > from is a given number greater than or equal to zero - depending on your > specific definition and needs. > > An application that needs to test for a Non-negative numbers, would > benefit from a isNonNegative method. Even though, the function simply > contains d >= 0. MathUtils.isNonNegative( 3 ) is conceptually different > from 3 >= 0. Personally, I would choose, "3 >= 0", but if a programmer > wished to invoke that operation via MathUtils.isNonNegative to attain a > sort of conceptual "purity", I don't think this is our decision to make. > > > I included Al's functions because they were a little more complex than > > that, they provided different return type when dealing with different > > evaluations. Of course these could be captured inline quite easily as > > well with examples like: > > > > d >= 0 ? 1d : -1d > > d > 0 ? 1d : -1d > > I'm not sure why that function would not return a boolean primitive, > anyone have any good reasons not to? I needed a function that returned a number so I could multiply by it. > > definitely reinvents the wheel in a very big way. I think in general its > > best to keep static functions in MathUtil's that simplify complex > > calculations like factorials. > > Again, I can see someone wanting these functions if one wants to be > absolutely sure that they are complying with strict conceptual > definitions in a very large system. I don't personally have a need for > isPositive, but that isn't to say that Al hasn't found a good reason to > use them in the past. > > Al? what was the motivation here? Wasn't my idea in the first place, I think it was Brent's. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)
--- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote: > I included Al's functions because they were a little more complex than > that, they provided different return type when dealing with different > evaluations. Of course these could be captured inline quite easily as > well with examples like: > > d >= 0 ? 1d : -1d > d > 0 ? 1d : -1d > ... I also want to point out that it's syntactically a little nicer to write a * sign(b) * c than a * ( b > 0 ? 1.0 : -1.0 ) * c > boolean isPositive(double d) > > definitely reinvents the wheel in a very big way. I think in general its > best to keep static functions in MathUtil's that simplify complex > calculations like factorials. That's an interesting point. I wasn't super-keen on isPositive/isNegative, and I confess I was tempted by the opportunity to reuse sign(). I'll hold off further development for now. > >> Would it be considered poor form to provide these methods in MathUtils > >> but have > >> them delegate to the stat subtree of the class hierarchy. That way > >> all the > >> actual code would be in one place, but we wouldn't force users to know > >> that > >> they're doing a statistical calculation when they just want average(x, > >> y). > >> > >> > > I actually was thinking the other way around. If you feel strongly > > about keeping these things in stat, we can create StatUtils. The point > > is to encapsulate these basic functions so that a) users can get them > > immediately without thinking about our stat abstractions and b) we can > > get the storage-based computations of the basic quantities in one place. > > When the UnivariateImpl window is finite, it should use the same > > computations that AbstractStoreUnivariate does -- this is why we need to > > encapsulate. > > I feel the need to wave a caution flag here. Using MathUtils as a ground > for exposing quick access to "default" functions is an interesting idea. > But I think it creates an Interface situation that over-complicates > the library, having multiple ways to do something tends to create > confusion. I would recommend we focus more for solidifying the > implementations and then consider simple static access to certain > functionality in the future after we have solid implementations in > place. And, I also suggest we base this on user response/need and not on > our initial expectations, if users like it and want it, we can add it. > > I say this because I believe other developers will become confused as to > whether to use the static or OO (Object Oriented) way to use the > functionality when developing. If we have two different strategies for > accessing functionality, then we need to have design rules on how where > to use each case in our own development. Interesting point as well. Not having encountered Java code that does this kind of double-exposure of functionality, I'm not sure how I feel about it. In Ruby it doesn't seem to be a problem, but then I haven't worked on large projects in that language, so again I may not have the experience to back up any opinions. I have seen this kind of dual interface in Perl modules (e.g., in CGI.pm), and there it seems to serve a useful purpose in providing syntactic flexibility, although admittedly the performance of the static/procedural vs. OO interfaces is disclaimed not to be identical. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] Static Utils and Methods (was: Re: [math] proposed ordering for task list, scope of initial release)
--- "Mark R. Diggory" <[EMAIL PROTECTED]> wrote: > > > Al Chou wrote: > > --- Phil Steitz <[EMAIL PROTECTED]> wrote: > > > >> > > >Simple methods like isPositive, isNegative, etc. can be used to make > >boolean expressions more human readable. I'm willing to build those two > >on top of sign (I'm so generous with my > >coding time, eh? ). Are those two sufficient? sign treats 0 as > positive, > >which may not be desirable. > > +1 (especially the part about your time :-) > >>> > >>> > >>>OK, I'll TDD those up, hopefully resolving the question of what to do > about the sign of 0 in the process. > >> > >> > >>Forgot to weigh in on this. I would say that 0 is neither positive nor > >>negative. If that is not a happy state, I would prefer to call > >>isPositive, "isNonNegative". I know that is ugly, I have a hard time > >>calling 0 a positive number. So, my first should would be isPositive > >>and isNegative both fail for zero, second would be to rename as above. > > > > > > I tend to agree with you, except for the usage that I wrote sign() for in > the > > first place. Granted, that may be an unusual usage, so I'll keep your > remarks > > in mind while I TDD. Also, I just realized that I won't be submitting the > > Ridders' method code for the initial release anyway (at least as far as I > > know), so maybe sign() needs to change, given that it has no users that > require > > the current behavior. > > > > > > Al > > > [-1] > > Um, I'm not too clear on this one, how is calling > MathUtils.isPositive(d) clearer than (d >= 0)? > > I think the argument over implementation above is a clear enough reason > as to why something like this shouldn't be created. There is a standard > logic to evaluations in java that is elegant and mathematical in nature. > I'd fear we would just be reinventing the wheel here. > > I included Al's functions because they were a little more complex than > that, they provided different return type when dealing with different > evaluations. Of course these could be captured inline quite easily as > well with examples like: > > d >= 0 ? 1d : -1d > d > 0 ? 1d : -1d > ... > > So again, I'm not sure how strong a benefit they provide in the long > run. I personally would probably exclude them on the basis that they are > overly simplified in comparison to what is already in MathUtils > (factorial and binomialCoefficient). It seems we should stick to > functionality that "extends" Math capabilities and not create a the new > wheel of alternative math functionality already present in java, the > sign() methods borderline this case of functionality and > > boolean isPositive(double d) > > definitely reinvents the wheel in a very big way. I think in general its > best to keep static functions in MathUtil's that simplify complex > calculations like factorials. Simple things are also good. I like sign or sgn. This is basic and missing from java. You have a good point, however re isPositive(), isNegative(). It's really a matter of taste, what makes more readable code. > > >> Would it be considered poor form to provide these methods in MathUtils > >> but have > >> them delegate to the stat subtree of the class hierarchy. That way > >> all the > >> actual code would be in one place, but we wouldn't force users to know > >> that > >> they're doing a statistical calculation when they just want average(x, > >> y). > >> > >> > > I actually was thinking the other way around. If you feel strongly > > about keeping these things in stat, we can create StatUtils. The point > > is to encapsulate these basic functions so that a) users can get them > > immediately without thinking about our stat abstractions and b) we can > > get the storage-based computations of the basic quantities in one place. > > When the UnivariateImpl window is finite, it should use the same > > computations that AbstractStoreUnivariate does -- this is why we need to > > encapsulate. > > I feel the need to wave a caution flag here. Using MathUtils as a ground > for exposing quick access to "default" functions is an interesting idea. > But I think it creates an Interface situation that over-complicates > the library, having multiple ways to do something tends to create > confusion. I would recommend we focus more for solidifying the > implementations and then consider simple static access to certain > functionality in the future after we have solid implementations in > place. And, I also suggest we base this on user response/need and not on > our initial expectations, if users like it and want it, we can add it. > I disagree. We need it ourselves, unless we want to duplicate code between UnivariateImpl and AbstractStoreUnivariate. Also, I personally and I am sure many other users would like simple array-based functions for means, sums, etc. If I have an array of doubles and I all I want to is compute its mean, I would like to be able to do that directly, r
Re: [math] proposed ordering for task list, scope of initial release
On Tue, 2003-06-10 at 14:23, Phil Steitz wrote: > Al Chou wrote: > I actually was thinking the other way around. If you feel strongly > about keeping these things in stat, we can create StatUtils. The point > is to encapsulate these basic functions so that a) users can get them > immediately without thinking about our stat abstractions and b) we can > get the storage-based computations of the basic quantities in one place. +1 > When the UnivariateImpl window is finite, it should use the same > computations that AbstractStoreUnivariate does -- this is why we need to > encapsulate. +1 I agree with both of these ideas. I think that putting everything in MathUtil might become unwieldy - no problem with creating a StatUtil. (If that hasn't already been done, I'm checking my email for the first time in days) Tim - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
--- "J.Pietschmann" <[EMAIL PROTECTED]> wrote: > Al Chou wrote: > > Finally, having used the Pietschmann root finder framework, I think it > needs > > some modification to make it more user-friendly. As a lay user, I would > have > > been much happier dealing with Brent W.'s interface than Herr > Pietschmann's, > > which was kind of cumbersome. I think, though, with a little slimming > down, it > > would be quite workable. > > I'm interested in hearung a few more details: what makes the > framework cumbersome? Admittedly I didn't have time yet to > look at Brent's framework. Having to instantiate an instance of the solver class seemed unnecessary. Brent's approach was to make the solver class' constructor private so that you simply call RootFinding.bisection( f, a, b ) rather than do RootFinding rootFinder = new RootFinding() ; double root = rootFinder.bisection( f, a, b ) ; That's a pretty easy change to make, although it prohibits the case of having two solvers simultaneously with different accuracy requirements or suchlike. You'd have to RootFinding.setAccuracy() ; between calls to different function/solver bound pairs, but I don't see our users needing to solve two equations with different accuracy requirements anytime soon. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
--- "J.Pietschmann" <[EMAIL PROTECTED]> wrote: > Al Chou wrote: > > I may have time to submit my Ridders' method implementation using J.'s > > framework before he returns 2 days hence. Should I bother to try, or > should I > > wait until he submits his code as a patch via Bugzilla? > > I'm a bit short on spare time anyway. OK, I'll submit on your behalf. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
--- Phil Steitz <[EMAIL PROTECTED]> wrote: > Al Chou wrote: > > --- Phil Steitz <[EMAIL PROTECTED]> wrote: > >>Al Chou wrote: > >>>--- Brent Worden <[EMAIL PROTECTED]> wrote: > >-Original Message- > >From: Phil Steitz [mailto:[EMAIL PROTECTED] > >Sent: Friday, June 06, 2003 12:21 PM [deletia] > Simple methods like isPositive, isNegative, etc. can be used to make > >>boolean > expressions more human readable. > >>> > >>> > >>>I'm willing to build those two on top of sign (I'm so generous with my > >>coding > >>>time, eh? ). Are those two sufficient? sign treats 0 as positive, > >>which > >>>may not be desirable. > >>> > >> > >>+1 (especially the part about your time :-) > > > > > > OK, I'll TDD those up, hopefully resolving the question of what to do about > the > > sign of 0 in the process. > > > Forgot to weigh in on this. I would say that 0 is neither positive nor > negative. If that is not a happy state, I would prefer to call > isPositive, "isNonNegative". I know that is ugly, I have a hard time > calling 0 a positive number. So, my first should would be isPositive > and isNegative both fail for zero, second would be to rename as above. I tend to agree with you, except for the usage that I wrote sign() for in the first place. Granted, that may be an unusual usage, so I'll keep your remarks in mind while I TDD. Also, I just realized that I won't be submitting the Ridders' method code for the initial release anyway (at least as far as I know), so maybe sign() needs to change, given that it has no users that require the current behavior. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: Finally, having used the Pietschmann root finder framework, I think it needs some modification to make it more user-friendly. As a lay user, I would have been much happier dealing with Brent W.'s interface than Herr Pietschmann's, which was kind of cumbersome. I think, though, with a little slimming down, it would be quite workable. I'm interested in hearung a few more details: what makes the framework cumbersome? Admittedly I didn't have time yet to look at Brent's framework. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
--- Phil Steitz <[EMAIL PROTECTED]> wrote: > Al Chou wrote: > > --- Phil Steitz <[EMAIL PROTECTED]> wrote: > > > >>Brent Worden wrote: > >> > -Original Message- > From: Phil Steitz [mailto:[EMAIL PROTECTED] > Sent: Friday, June 06, 2003 12:21 PM > To: [EMAIL PROTECTED] > Subject: [math] proposed ordering for task list, scope of initial > release > >>> > > [deletia] > > > >>>Things that might be added: > >>>Average of two numbers comes up a lot. > >> > >>Yes. Some (of us) might not like the organization of this; but I have a > >>couple of times posted the suggestion that we add several > >>double[]->double functions to MathUtils representing the core > >>computations for univariate -- mean, min, max, variance, sum, sumsq. > >>This would be convenient for users and us as well. I guess I would not > >>be averse to moving these to stat.StatUtils, maybe just adding ave(x,y) > >>to MathUtils. > >> > >>Given the post that I just saw regarding financial computations, I > >>suggest that we let MathUtils grow a bit (including the double[]->double > >>functions and then think about breaking it apart prior to release. As > >>long as we stick to simple static methods, that will not be hard to do. > > > > > > Would it be considered poor form to provide these methods in MathUtils but > have > > them delegate to the stat subtree of the class hierarchy. That way all the > > actual code would be in one place, but we wouldn't force users to know that > > they're doing a statistical calculation when they just want average(x, y). > > > > > I actually was thinking the other way around. If you feel strongly > about keeping these things in stat, we can create StatUtils. The point > is to encapsulate these basic functions so that a) users can get them > immediately without thinking about our stat abstractions and b) we can > get the storage-based computations of the basic quantities in one place. > When the UnivariateImpl window is finite, it should use the same > computations that AbstractStoreUnivariate does -- this is why we need to > encapsulate. My organizational instincts say to put the implementation in stat and delegate to it from MathUtils. Probably 99% of actual use will consist of code calling MathUtils (because no one will bother to learn that the implementation is really in stat), but until we see a performance problem I'm strongly for categorizing things as what they are (what they are in my mind, of course ). Avoiding premature optimization and YAGNI, and so on > >>>Some other constants besides E and PI: golden ratio, euler, sqrt(PI), etc. > >>>I've used a default error constant several places. > >> > >> It would be nice to come > >> > >>>up with a central location for such values. > >> > >>I get the first 3, but what exactly do you mean by the default error > >>constant? > > > > > > I read that to mean the accuracy requested (aka allowable error) of a given > > algorithm invocation. > > > > But why would we ever want to define that as a constant? I wouldn't, at least not as a global constant. That's why I suggested we define an interface that can be implemented by the classes that need this functionality. That way we'll have a consistent way to set the value for each class that needs it. Currently, Brent's bisection method hardcodes it, whereas Herr Pietschmann's framework provides a getter/setter pair in an interface. I wonder if it's even possible to abstract further and pull the accuracy aspect into a separate interface. Accuracy/error _seems_ like a general concept, but it could be too fuzzy a concept to yield a concrete interface specification. > >>>In addition to the above, has any thought gone into a set of application > >>>exceptions that will be thrown. Are we going to rely on Java core > >>>exceptions or are we going to create some application specific exceptions? > >>>As I recall J uses a MathException in the solver routines and I added a > >>>ConvergenceException. Should we expand that list or fold it into one > >>>generic application exception or do away with application exceptions all > >>>together? > >> > >>My philosophy on this is that whatever exceptions we define should be > >>"close" to the components that throw them -- e.g. ConvergenceException. > >> I do not like the idea of a generic "MathException." As much as > >>possible, I think that we should rely on the built-ins (including the > >>extensions recently added to lang). Regarding ConvergenceException, I am > >>on the fence for inclusion in the initial release, though I see > >>something like this as eventually inevitable. Correct me if I am wrong, > >>but the only place that this is used now is in the dist package and we > >>could either just throw a RuntimeException directly there or return NaN. > >> I do see the semantic value of ConvergenceException, however. I guess > >>I would vote for keeping it. > > > > > > I agree that we should ha
Re: [math] proposed ordering for task list, scope of initial release
Phil Steitz wrote: My philosophy on this is that whatever exceptions we define should be "close" to the components that throw them -- e.g. ConvergenceException. I do not like the idea of a generic "MathException." As much as possible, I think that we should rely on the built-ins (including the extensions recently added to lang). Regarding ConvergenceException, I am on the fence for inclusion in the initial release, though I see something like this as eventually inevitable. Correct me if I am wrong, but the only place that this is used now is in the dist package and we could either just throw a RuntimeException directly there or return NaN. I do see the semantic value of ConvergenceException, however. There are several approaches to design a concept for exceptions, all of which have pros and cons. I personally would suggest to avoid returning NaNs and throwing RuntimeExceptions whereever possible and use a package specific hierarchy of declared exceptions instead. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: I may have time to submit my Ridders' method implementation using J.'s framework before he returns 2 days hence. Should I bother to try, or should I wait until he submits his code as a patch via Bugzilla? I'm a bit short on spare time anyway. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: --- Phil Steitz <[EMAIL PROTECTED]> wrote: Al Chou wrote: --- Brent Worden <[EMAIL PROTECTED]> wrote: -Original Message- From: Phil Steitz [mailto:[EMAIL PROTECTED] Sent: Friday, June 06, 2003 12:21 PM [deletia] Something similar to JUnit's assertEquals(double expected, double actual, double epsilon). This is a good idea. Is JUnit's license (http://www.opensource.org/licenses/ibmpl.php) Apache compatible? I think that Brent is talking about defining a new function called something like approximatelyEquals() that returned a boolean. The signature, semantics and implementation of this would be different from JUnit. Ah, OK. That could be useful indeed. Simple methods like isPositive, isNegative, etc. can be used to make boolean expressions more human readable. I'm willing to build those two on top of sign (I'm so generous with my coding time, eh? ). Are those two sufficient? sign treats 0 as positive, which may not be desirable. +1 (especially the part about your time :-) OK, I'll TDD those up, hopefully resolving the question of what to do about the sign of 0 in the process. Forgot to weigh in on this. I would say that 0 is neither positive nor negative. If that is not a happy state, I would prefer to call isPositive, "isNonNegative". I know that is ugly, I have a hard time calling 0 a positive number. So, my first should would be isPositive and isNegative both fail for zero, second would be to rename as above. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Brent Worden wrote: -Original Message- * t-test statistic needs to be added and we should probably add the capability of actually performing t- and chi-square tests at fixed significance levels (.1, .05, .01, .001). -- This is virtually done, just need to define a nice, convenient interface for doing one- and two-tailed tests. Thanks to Brent, we can actually support user-supplied significance levels (next item) Anyone have any thoughts on the interface? I was thinking of an Inference interface that supports the conducting of one- and two-tailed tests as well as constructing their complementary confidence intervals. Or, if we want to separate concerns create both a HypothesisTest and a ConfidenceInterval interface, one for each type of inference. Either way, I would use the tried-and-true abstract factory way of creating inference instances. Comments are welcome. I have been thinking about this. If I can stop sending emails for long enought to pull the patch together, I am about to submit a patch to BivariateRegression that adds the slope confidence interval computation and significance level, based on the new t-distribution impl (thanks, Brent!). I thought about a generic ConfidenceInterval interface, but then thought that it would be more convenient for users to just return the halfwidth in double getSlopeConfidenceInterval(). To support the goal of testing model significance, I also added getSignificance(). I think the concrete stuff is easier to use and all we need at present. Something like: boolean twoTailedTTest(Univariate, Univariate,signif) or even boolean twoTailedTTest(double[],double[],signif) (obviously adding one-tailed tests and tests against constants as well and tests that return doubles representing minimal p-values, possibly called "significance") boolean chiSquareTest(expected, observed, signif) boolean chiSquareTest(Freq, Freq, signif) To add the abstractions above meaningfully, we need to convince ourselves that either a) multiple implementation strategies might exist -- For parametric tests, this is not the case -- or b) the abstractions will make development of inferential components easier/more manageable. I am not sure about b). In fact, when I think about it I think that there is not much left when you abstract things to a high enough level to represent hypothesis testing and/or confidence intervals generically. I remember math stat students having a hard time understanding the abstract definitions of these concepts. I don't think that it is a good idea to force our users to think about these things. Therefore, I would recommend sticking with concrete implementations defined "close to" the statistical applications. Keep the user application use cases in mind. If I want to determine whether the diffence in two means is significant, I should be able to do that quickly and intuitively, with one method call either using Univariates or double[]s. * numerical approximation of the t- and chi-square distributions to enable user-supplied significance levels. See above. Someone just needs to put a fork in this. Tim? Brent? Done. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: --- Al Chou <[EMAIL PROTECTED]> wrote: --- Phil Steitz <[EMAIL PROTECTED]> wrote: [deletia] OK, long-winded disclaimer aside, here is how I see the task list ordered: [deletia] * Framework and implementation strategie(s) for finding roots or real-valued functions of one (real) variable. Here again -- largely done. I would prefer to wait until J gets back and let him submit his framework and R. Brent's algorithm. Then "our" Brent's implementation and usage can be integrated (actually not much to do, from the looks of the current code) and I will add my "bean equations" stuff (in progress). I may have time to submit my Ridders' method implementation using J.'s framework before he returns 2 days hence. Should I bother to try, or should I wait until he submits his code as a patch via Bugzilla? Well, I've just spent some time over the past 3 days reminding myself pf some of the things that are so hard about numerics. BTW, in the process of using Herr Pietschmann's root finder framework, I discovered a bug in setMaximalIterationCount (it sets defaultMaximalIterationCount instead of maximalIterationCount). So I pulled out Herr Pietschmann's Brent method class and tested it, and it threw an exception telling me, "Possibly multiple zeros in interval or ill conditioned function." The morals of the story are: - More-sophisticated algorithms that are supposed to converge faster don't always do so - It's easy to outsmart yourself and create code that's too finicky for non-numericist users. Good thing to keep reminding ourselves. As someone said recently on the list, a typical user probably is more interested in an algorithm that's guaranteed to converge to a root (if there is one) than in the rate of convergence, as long as it's not too ridiculously slow. Given that we've repeatedly determined that commons-math is not to be a general numerical mathematics library, I think now that we should provide only a bisection method in the initial release (assuming we achieve one) and spend time later making our implementations of the more sophisticated algorithms more user-friendly, if we find they're even needed. +1, but maybe adding Secant method (I think J included this as well, if memory serves). Finally, having used the Pietschmann root finder framework, I think it needs some modification to make it more user-friendly. As a lay user, I would have been much happier dealing with Brent W.'s interface than Herr Pietschmann's, which was kind of cumbersome. I think, though, with a little slimming down, it would be quite workable. We should let J comment on this. Also, the "bean equations" stuff that I am working on will be *very* easy to use (though less sophisticated). Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: --- Brent Worden <[EMAIL PROTECTED]> wrote: -Original Message- From: Phil Steitz [mailto:[EMAIL PROTECTED] Sent: Friday, June 06, 2003 12:21 PM [deletia] * Exponential growth and decay (set up for financial applications) I think this is just going to be a matter of finding the right formulas to add to MathUtils. I don't want to get carried away with financial computations, but some simple, commonly used formulas would be a nice addition to the package. We should also be thinking about other things to add to MathUtils -- religiously adhering to th guiding principles, of course. Al's sign() is an excellent example of the kind of thing that we should be adding, IMHO. Things that might be added: Average of two numbers comes up a lot. Do we muddy the class hierarchy by putting such a thing into MathUtils rather than the stat subtree? Something similar to JUnit's assertEquals(double expected, double actual, double epsilon). This is a good idea. Is JUnit's license (http://www.opensource.org/licenses/ibmpl.php) Apache compatible? I think that Brent is talking about defining a new function called something like approximatelyEquals() that returned a boolean. The signature, semantics and implementation of this would be different from JUnit. Simple methods like isPositive, isNegative, etc. can be used to make boolean expressions more human readable. I'm willing to build those two on top of sign (I'm so generous with my coding time, eh? ). Are those two sufficient? sign treats 0 as positive, which may not be desirable. +1 (especially the part about your time :-) Some other constants besides E and PI: golden ratio, euler, sqrt(PI), etc. That would be nice, though we should consider which ones are really needed generally. I personally love the lore of constants, of which there are more than you might imagine (see http://mathworld.wolfram.com/topics/Constants.html). I've used a default error constant several places. It would be nice to come up with a central location for such values. Or at least define a consistent interface that could be implemented by whatever needs that. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Brent Worden wrote: -Original Message- From: Phil Steitz [mailto:[EMAIL PROTECTED] Sent: Friday, June 06, 2003 12:21 PM To: [EMAIL PROTECTED] Subject: [math] proposed ordering for task list, scope of initial release Here is a *proposed* ordering for the task list, with a little commentary added. One thing that I want to make *very* clear up front, is that I *never* intended the task list or the items listed in the scope section of the proposal to be definitive. All that is definitive are the guiding principles, which just try to keep us focused on stuff that people will find both useful and easy to use. I expected that the actual contents of the first release would include some things not on the list and would exclude some of the things there. At this stage, as Jouzas pointed out, it is more important for us to build community than to rush a release out the door. So if there are things that fit the guidelines that others would like to contribute, but which are not on the list, *please* suggest them. Also, for those who may not have dug into the code, but who may be interested in contributing, please rest assured that deep mathematical knowledge is not required to help. We can review implementations and deal with mathematical problems as they arise, using our small but growing community as a resource. The same is obviously true on the the Java/OS tools side -- no need to be an expert to contribute. OK, long-winded disclaimer aside, here is how I see the task list ordered: * The RealMatrixImpl class is missing some key method implementations. The critical thing is solution of linear systems. We need to implement a numerically sound solution algorithm. This will enable inverse() and also support general linear regression. -- I think that Brent is working on this. The only thing I've done is the Cholesky decomposition. I haven't done anything for the general linear system case. Are you going to do this, or should I take it on? * t-test statistic needs to be added and we should probably add the capability of actually performing t- and chi-square tests at fixed significance levels (.1, .05, .01, .001). -- This is virtually done, just need to define a nice, convenient interface for doing one- and two-tailed tests. Thanks to Brent, we can actually support user-supplied significance levels (next item) Anyone have any thoughts on the interface? I was thinking of an Inference interface that supports the conducting of one- and two-tailed tests as well as constructing their complementary confidence intervals. Or, if we want to separate concerns create both a HypothesisTest and a ConfidenceInterval interface, one for each type of inference. Either way, I would use the tried-and-true abstract factory way of creating inference instances. Comments are welcome. * numerical approximation of the t- and chi-square distributions to enable user-supplied significance levels. See above. Someone just needs to put a fork in this. Tim? Brent? Done. Including the testing interface? See below. * *new* add support for F distribution and F test, so that we can report signinficance level of correlation coefficient in bivariate regression / signinficance of model. I will do this if no one else wants to. Done. I'll probably knock out a few more easy continuous distributions to get them out of the way. * Framework and implementation strategie(s) for finding roots or real-valued functions of one (real) variable. Here again -- largely done. I would prefer to wait until J gets back and let him submit his framework and R. Brent's algorithm. Then "our" Brent's implementation and usage can be integrated (actually not much to do, from the looks of the current code) and I will add my "bean equations" stuff (in progress). Sounds good. * Extend distribution framework to support discrete distributions and implement binomial and hypergeometric distributions. I will do this if no one else wants to. If someone else does it, you should make sure to use the log binomials in computations. Binomial can easily be obtained using the regularized beta function that is already defined. Hypergeometric will be a little more work as I don't think there's a compact formula to compute the cpf. Using the log binomials, direct computation of the density might not be too bad. I have not researched this, but that is what I was thinking. One thing to note, since the discrete distributions do not have nice invertible mappings for critical values to probabilities like those found for continuous distributions, how should the inverseCummulativeProbability method work? For a given probability, p, should the method return one value, x, such that x is the largest value where P(X <= x) <= p? Or the smallest value, x, where P(X <= x) >= p. Or should the method return two values, x0 and x1, such that P(X <= x0) <= p <= P(X <= x1)? I think in the discrete case, we should supply the density function (and the cumulative probabili
RE: [math] proposed ordering for task list, scope of initial release
--- Brent Worden <[EMAIL PROTECTED]> wrote: > > -Original Message- > > From: Phil Steitz [mailto:[EMAIL PROTECTED] > > Sent: Friday, June 06, 2003 12:21 PM [deletia] > > * Exponential growth and decay (set up for financial > > applications) I think this > > is just going to be a matter of finding the right formulas to add > > to MathUtils. > > I don't want to get carried away with financial computations, > > but some simple, > > commonly used formulas would be a nice addition to the package. > > We should also > > be thinking about other things to add to MathUtils -- religiously > > adhering to > > th guiding principles, of course. Al's sign() is an excellent > > example of the > > kind of thing that we should be adding, IMHO. > > Things that might be added: > Average of two numbers comes up a lot. Do we muddy the class hierarchy by putting such a thing into MathUtils rather than the stat subtree? > Something similar to JUnit's assertEquals(double expected, double actual, > double epsilon). Is JUnit's license (http://www.opensource.org/licenses/ibmpl.php) Apache compatible? > Simple methods like isPositive, isNegative, etc. can be used to make boolean > expressions more human readable. I'm willing to build those two on top of sign (I'm so generous with my coding time, eh? ). Are those two sufficient? sign treats 0 as positive, which may not be desirable. > Some other constants besides E and PI: golden ratio, euler, sqrt(PI), etc. That would be nice, though we should consider which ones are really needed generally. I personally love the lore of constants, of which there are more than you might imagine (see http://mathworld.wolfram.com/topics/Constants.html). > I've used a default error constant several places. It would be nice to come > up with a central location for such values. Or at least define a consistent interface that could be implemented by whatever needs that. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
--- Al Chou <[EMAIL PROTECTED]> wrote: > --- Phil Steitz <[EMAIL PROTECTED]> wrote: > [deletia] > > OK, long-winded disclaimer aside, here is how I see the task list ordered: [deletia] > > * Framework and implementation strategie(s) for finding roots or > real-valued > > functions of one (real) variable. Here again -- largely done. I would > > prefer > > to wait until J gets back and let him submit his framework and R. Brent's > > algorithm. Then "our" Brent's implementation and usage can be integrated > > (actually not much to do, from the looks of the current code) and I will > add > > my "bean equations" stuff (in progress). > > I may have time to submit my Ridders' method implementation using J.'s > framework before he returns 2 days hence. Should I bother to try, or should > I > wait until he submits his code as a patch via Bugzilla? Well, I've just spent some time over the past 3 days reminding myself pf some of the things that are so hard about numerics. I was testing my Ridders' method implementation and couldn't understand why it took so many iterations to converge and still not be within the requested accuracy of the known root I asked it to find. I used a simple quintic (x+1)(x+0.5)(x)(x-0.5)(x-1) as the function whose roots I want to find, and I made sure to give upper and lower bounds that I know bracket one and only one root. When trying to find the roots at x = +- 0.5 my solver had no trouble (though I didn't ask it how many of the 100 iterations it was allowed that it actually used, until later), but the root at x = 0 was never within even a factor of 15 of the requested 1e-6 accuracy even when allowed to take up to 200 iterations (actually, I used this test case first, which was what prompted me to try the larger-valued roots in case I was seeing some loss of precision or roundoff error effect). BTW, in the process of using Herr Pietschmann's root finder framework, I discovered a bug in setMaximalIterationCount (it sets defaultMaximalIterationCount instead of maximalIterationCount). I then decided to try Brent W.'s bisection solver, which converged to the desired root to within its requested accuracy (1e-9) in 26 or 27 iterations even for the root at x = 0. At this point I asked my Ridders' method how many iterations it took to find x = 0.5, and it said 1, and I realized that was probably because my bracket values were symmetric (or close enough) about the root, so its midpoint evaluation of the function found the root by coincidence. When I made sure the bracket values weren't symmetric about that root, I was back to 146 iterations or more and not getting to within the requested accuracy of the root location. So I pulled out Herr Pietschmann's Brent method class and tested it, and it threw an exception telling me, "Possibly multiple zeros in interval or ill conditioned function." The morals of the story are: - More-sophisticated algorithms that are supposed to converge faster don't always do so - It's easy to outsmart yourself and create code that's too finicky for non-numericist users. As someone said recently on the list, a typical user probably is more interested in an algorithm that's guaranteed to converge to a root (if there is one) than in the rate of convergence, as long as it's not too ridiculously slow. Given that we've repeatedly determined that commons-math is not to be a general numerical mathematics library, I think now that we should provide only a bisection method in the initial release (assuming we achieve one) and spend time later making our implementations of the more sophisticated algorithms more user-friendly, if we find they're even needed. I believe we've let ourselves go down the path of as-yet-unjustified optimization in our designs, because we know of algorithms that are supposed to be "better". I also have a greater, first-hand, appreciation of the subtleties in NR's code to make it more robust for the user, and I believe we can only achieve that level of robustness if we take enough time -- which we should not prior to the initial release, because that will be too much time. Finally, having used the Pietschmann root finder framework, I think it needs some modification to make it more user-friendly. As a lay user, I would have been much happier dealing with Brent W.'s interface than Herr Pietschmann's, which was kind of cumbersome. I think, though, with a little slimming down, it would be quite workable. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [math] proposed ordering for task list, scope of initial release
> -Original Message- > From: Phil Steitz [mailto:[EMAIL PROTECTED] > Sent: Friday, June 06, 2003 12:21 PM > To: [EMAIL PROTECTED] > Subject: [math] proposed ordering for task list, scope of initial > release > > > Here is a *proposed* ordering for the task list, with a little commentary > added. > > One thing that I want to make *very* clear up front, is that I > *never* intended > the task list or the items listed in the scope section of the > proposal to be > definitive. All that is definitive are the guiding principles, > which just try > to keep us focused on stuff that people will find both useful and > easy to use. > I expected that the actual contents of the first release would > include some > things not on the list and would exclude some of the things > there. At this > stage, as Jouzas pointed out, it is more important for us to > build community > than to rush a release out the door. So if there are things that fit the > guidelines that others would like to contribute, but which are > not on the list, > *please* suggest them. Also, for those who may not have dug into > the code, but > who may be interested in contributing, please rest assured that deep > mathematical knowledge is not required to help. We can review > implementations > and deal with mathematical problems as they arise, using our > small but growing > community as a resource. The same is obviously true on the the > Java/OS tools > side -- no need to be an expert to contribute. > > OK, long-winded disclaimer aside, here is how I see the task list ordered: > > * The RealMatrixImpl class is missing some key method implementations. The > critical thing is solution of linear systems. We need to implement a > numerically sound solution algorithm. This will enable inverse() and also > support general linear regression. -- I think that Brent is > working on this. The only thing I've done is the Cholesky decomposition. I haven't done anything for the general linear system case. > * t-test statistic needs to be added and we should probably add > the capability > of actually performing t- and chi-square tests at fixed > significance levels > (.1, .05, .01, .001). -- This is virtually done, just need to > define a nice, > convenient interface for doing one- and two-tailed tests. Thanks > to Brent, we > can actually support user-supplied significance levels (next item) Anyone have any thoughts on the interface? I was thinking of an Inference interface that supports the conducting of one- and two-tailed tests as well as constructing their complementary confidence intervals. Or, if we want to separate concerns create both a HypothesisTest and a ConfidenceInterval interface, one for each type of inference. Either way, I would use the tried-and-true abstract factory way of creating inference instances. Comments are welcome. > > * numerical approximation of the t- and chi-square distributions to enable > user-supplied significance levels. See above. Someone just > needs to put a > fork in this. Tim? Brent? Done. > > * *new* add support for F distribution and F test, so that we can report > signinficance level of correlation coefficient in bivariate regression / > signinficance of model. I will do this if no one else wants to. Done. I'll probably knock out a few more easy continuous distributions to get them out of the way. > > * Framework and implementation strategie(s) for finding roots or > real-valued > functions of one (real) variable. Here again -- largely done. I > would prefer > to wait until J gets back and let him submit his framework and R. Brent's > algorithm. Then "our" Brent's implementation and usage can be integrated > (actually not much to do, from the looks of the current code) and > I will add my > "bean equations" stuff (in progress). Sounds good. > > * Extend distribution framework to support discrete distributions > and implement > binomial and hypergeometric distributions. I will do this if no > one else wants > to. If someone else does it, you should make sure to use the log > binomials in > computations. Binomial can easily be obtained using the regularized beta function that is already defined. Hypergeometric will be a little more work as I don't think there's a compact formula to compute the cpf. One thing to note, since the discrete distributions do not have nice invertible mappings for critical values to probabilities like those found for continuous distributions, how should the inverseCummulativeProbability method work? For a given probability, p, should the method return one value, x, such that x is the largest value where P(X <= x) <= p? Or the smallest value, x, where P(X <= x) >= p. Or should the method return two values, x0 and x1, such that P(X <= x0) <= p <= P(X <= x1)? > > * Exponential growth and decay (set up for financial > applications) I think this > is just going to be a matter of finding the right formulas to add > to MathUtils. > I don't want to get carried awa
Re: [math] proposed ordering for task list, scope of initial release
--- Phil Steitz <[EMAIL PROTECTED]> wrote: > >>* Improve numerical accuracy of Univariate and BivariateRegression > >>statistical > >>computations. Encapsulate basic double[] |-> double mean, variance, min, > max > >>computations using improved formulas and add these to MathUtils. (probably > >>should add float[], int[], long[] versions as well.) Then refactor all > >>univariate implementations that use stored values (including UnivariateImpl > >>with finite window) to use the improved versions. -- Mark? I am chasing > down > >>the TAS reference to document the source of the _NR_ formula, which I will > >>add > >>to the docs if someone else does the implementation. > > > > > > I was starting to code the updating (storage-less) variance formula, based > on > > the Stanford article you cited, as a patch. I believe the storage-using > > corrected two-pass algorithm is pretty trivial to code once we feel we're > on > > solid ground with the reference to cite. > > > > > OK. I finally got hold of the American Statistician article (had to > resort to the old trundle down to local university library method) and Great! Thanks. > found lots of good stuff in it -- including a reference to Hanson's > recursive formula (from Stanford paper) and some empirical and > theoretical results confirming that NR 14.1.8 is about the best that you > can do for the stored case. There is a refinement mentioned in which > "pairwise summation" is used (essentially splitting the sample in two > and computing the recursive sums in parallel); but the value of this I was wondering what the pairwise method was, and whether it was another name for a technique we'd already discussed. Sounds sort of like Shell's sort or other recursive divide-and-occur algorithms. > only kicks in for large n. I propose that we use NR 14.1.8 as is for > all stored computations. Here is good text for the reference: > > Based on the corrected two-pass algorithm for computing the > sample variance, as described in "Algorithms for Computing the Sample > Variance: Analysis and Recommendations",Tony F Chan, Gene H. Golub and > Randall J. LeVeque, The American Statitistician, 1983, Vol 37, > No. 3. (Eq. (1.7) on page 243.) > > The empirical investigation that the authors do uses the following trick > that I have thought about using to investigate the precision in our > stuff: implement an algorithm using both floats and doubles and use the > double computations to assess stability of the algorithm implemented > using floats. Might want to play with this a little. Yes, I skimmed part of the Stanford article and noticed that test technique. It's interesting, and as you say, we may want to experiment with it to see what it can tell us. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
* Improve numerical accuracy of Univariate and BivariateRegression statistical computations. Encapsulate basic double[] |-> double mean, variance, min, max computations using improved formulas and add these to MathUtils. (probably should add float[], int[], long[] versions as well.) Then refactor all univariate implementations that use stored values (including UnivariateImpl with finite window) to use the improved versions. -- Mark? I am chasing down the TAS reference to document the source of the _NR_ formula, which I will add to the docs if someone else does the implementation. I was starting to code the updating (storage-less) variance formula, based on the Stanford article you cited, as a patch. I believe the storage-using corrected two-pass algorithm is pretty trivial to code once we feel we're on solid ground with the reference to cite. OK. I finally got hold of the American Statistician article (had to resort to the old trundle down to local university library method) and found lots of good stuff in it -- including a reference to Hanson's recursive formula (from Stanford paper) and some empirical and theoretical results confirming that NR 14.1.8 is about the best that you can do for the stored case. There is a refinement mentioned in which "pairwise summation" is used (essentially splitting the sample in two and computing the recursive sums in parallel); but the value of this only kicks in for large n. I propose that we use NR 14.1.8 as is for all stored computations. Here is good text for the reference: Based on the corrected two-pass algorithm for computing the sample variance, as described in "Algorithms for Computing the Sample Variance: Analysis and Recommendations",Tony F Chan, Gene H. Golub and Randall J. LeVeque, The American Statitistician, 1983, Vol 37, No. 3. (Eq. (1.7) on page 243.) The empirical investigation that the authors do uses the following trick that I have thought about using to investigate the precision in our stuff: implement an algorithm using both floats and doubles and use the double computations to assess stability of the algorithm implemented using floats. Might want to play with this a little. Phil - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
Al Chou wrote: [deletia]**2 * Improve numerical accuracy of Univariate and BivariateRegression statistical computations. Encapsulate basic double[] |-> double mean, variance, min, max computations using improved formulas and add these to MathUtils. (probably should add float[], int[], long[] versions as well.) Then refactor all univariate implementations that use stored values (including UnivariateImpl with finite window) to use the improved versions. -- Mark? I am chasing down the TAS reference to document the source of the _NR_ formula, which I will add to the docs if someone else does the implementation. I was starting to code the updating (storage-less) variance formula, based on the Stanford article you cited, as a patch. I believe the storage-using corrected two-pass algorithm is pretty trivial to code once we feel we're on solid ground with the reference to cite. Yes. I just wanted to propose the refactoring. * Framework and implementation strategie(s) for finding roots or real-valued functions of one (real) variable. Here again -- largely done. I would prefer to wait until J gets back and let him submit his framework and R. Brent's algorithm. Then "our" Brent's implementation and usage can be integrated (actually not much to do, from the looks of the current code) and I will add my "bean equations" stuff (in progress). I may have time to submit my Ridders' method implementation using J.'s framework before he returns 2 days hence. Should I bother to try, or should I wait until he submits his code as a patch via Bugzilla? I doubt that J would mind if someone else were to submit the framework (including his @author of course) from his post to the list. You could combine his classes and yours into one patch and submit it if you have time to do this before he gets back. * Polynomial Interpolation -- let Al tell us what to do here. Even better, let Al do it (he he). I actually did some research last night (I told myself I was going to bed early, hah) on rational function interpolation, trying to find a primary source for the algorithm rather than again rely on a secondary source in the form of NR. I guess I'll continue along this path, as I really want a clean room implementation of it for my own use. I'd feel better using rational functions rather than polynomials for their generally larger radius of convergence. Thanks for looking into this. If you think rational functions are better, go for it. One more thing to think about is splines. A natural spline implementation might be easier to document/understand from users' perspective. We might want to eventually support both (and maybe even polynomial interpolation). Phil Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [math] proposed ordering for task list, scope of initial release
--- Phil Steitz <[EMAIL PROTECTED]> wrote: [deletia] > OK, long-winded disclaimer aside, here is how I see the task list ordered: > > * The RealMatrixImpl class is missing some key method implementations. The > critical thing is solution of linear systems. We need to implement a > numerically sound solution algorithm. This will enable inverse() and also > support general linear regression. -- I think that Brent is working on this. > > > * Improve numerical accuracy of Univariate and BivariateRegression > statistical > computations. Encapsulate basic double[] |-> double mean, variance, min, max > computations using improved formulas and add these to MathUtils. (probably > should add float[], int[], long[] versions as well.) Then refactor all > univariate implementations that use stored values (including UnivariateImpl > with finite window) to use the improved versions. -- Mark? I am chasing down > the TAS reference to document the source of the _NR_ formula, which I will > add > to the docs if someone else does the implementation. I was starting to code the updating (storage-less) variance formula, based on the Stanford article you cited, as a patch. I believe the storage-using corrected two-pass algorithm is pretty trivial to code once we feel we're on solid ground with the reference to cite. > * Define full package structure and develop user's guide following the > package > structure. I have started work on the user's guide, but found this > impossible > without the package structure defined. I will post a separate message > summarizing what has been proposed up to now and making a recommendation. > > * t-test statistic needs to be added and we should probably add the > capability > of actually performing t- and chi-square tests at fixed significance levels > (.1, .05, .01, .001). -- This is virtually done, just need to define a nice, > convenient interface for doing one- and two-tailed tests. Thanks to Brent, > we > can actually support user-supplied significance levels (next item) > > * numerical approximation of the t- and chi-square distributions to enable > user-supplied significance levels. See above. Someone just needs to put a > fork in this. Tim? Brent? > > * *new* add support for F distribution and F test, so that we can report > signinficance level of correlation coefficient in bivariate regression / > signinficance of model. I will do this if no one else wants to. > > * Framework and implementation strategie(s) for finding roots or real-valued > functions of one (real) variable. Here again -- largely done. I would > prefer > to wait until J gets back and let him submit his framework and R. Brent's > algorithm. Then "our" Brent's implementation and usage can be integrated > (actually not much to do, from the looks of the current code) and I will add > my "bean equations" stuff (in progress). I may have time to submit my Ridders' method implementation using J.'s framework before he returns 2 days hence. Should I bother to try, or should I wait until he submits his code as a patch via Bugzilla? > * Extend distribution framework to support discrete distributions and > implement > binomial and hypergeometric distributions. I will do this if no one else > wants > to. If someone else does it, you should make sure to use the log binomials > in > computations. > > * Exponential growth and decay (set up for financial applications) I think > this > is just going to be a matter of finding the right formulas to add to > MathUtils. > I don't want to get carried away with financial computations, but some > simple, > commonly used formulas would be a nice addition to the package. We should > also > be thinking about other things to add to MathUtils -- religiously adhering to > th guiding principles, of course. Al's sign() is an excellent example of the > kind of thing that we should be adding, IMHO. Thanks for the compliment! I think I finally understand what you mean with the exponential stuff: compount interest calculation, for the most part, with continuous compounding requiring the exponential. > * Polynomial Interpolation -- let Al tell us what to do here. Even better, > let Al do it (he he). I actually did some research last night (I told myself I was going to bed early, hah) on rational function interpolation, trying to find a primary source for the algorithm rather than again rely on a secondary source in the form of NR. I guess I'll continue along this path, as I really want a clean room implementation of it for my own use. I'd feel better using rational functions rather than polynomials for their generally larger radius of convergence. Al = Albert Davidson Chou Get answers to Mac questions at http://www.Mac-Mgrs.org/ . __ Do you Yahoo!? Yahoo! Calendar - Free online calendar with sync to Outlook(TM). http://calendar.yahoo.com - To unsubscribe