> On Jun 29, 2014, at 3:41 PM, Gilles <gil...@harfang.homelinux.org> wrote: > >> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote: >>> On 6/29/14, 2:30 PM, Gilles wrote: >>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: >>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: >>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz >>>>> <phil.ste...@gmail.com> wrote: >>>>> >>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: >>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit : >>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe >>>>>>>> <l...@spaceroots.org> >>>>>> wrote: >>>>>>>>> Hi Venkat, >>>>>>>>> >>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit : >>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe >>>>>>>>>> <l...@spaceroots.org> >>>>>>>>> wrote: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> While looking further in Percentile class for MATH-1120, I >>>>>>>>>>> have found >>>>>>>>>>> another problem in the current implementation. >>>>>>>>>>> NaNStrategy.FIXED >>>>>> should >>>>>>>>>>> leave the NaNs in place, but at the end of the >>>>>>>>>>> KthSelector.select >>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of >>>>>>>>>>> the small >>>>>>>>>>> sub-array. What is really implemented here is therefore >>>>>>>>>>> closer to >>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always >>>>>>>>>>> occur in the >>>>>>>>>>> test cases because they use very short arrays, and we >>>>>>>>>>> directly >>>>>> switch to >>>>>>>>>>> this part of the select method. >>>>>>>>>> Are NaNs considered higher than +Inf ? >>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need >>>>>>>>>> something to >>>>>>>>>> indicate beyond this for NaN. >>>>>>>>> Well, we can just keep the NaN themselves and handled them >>>>>>>>> appropriately, hoping not to trash performances too much. >>>>>>>> Agreed. >>>>>>>> >>>>>>>>>> What is the test input you see an issue and what is the >>>>>>>>>> actual error >>>>>>>>>> you are seeing. Please share the test case. >>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The >>>>>>>>> first check in >>>>>>>>> the test corresponds to a Percentile configuration at 50% >>>>>>>>> percentile, >>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of >>>>>>>>> entries, so the >>>>>>>>> 50% percentile is exactly one of the entries: the one at >>>>>>>>> index 5 in the >>>>>>>>> final array. >>>>>>>>> >>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, >>>>>>>>> NaN, 8 }. So >>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified >>>>>>>>> at all in >>>>>>>>> the selection algorithm and the result for 50% should be the >>>>>>>>> first NaN >>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort, >>>>>>>>> we *do* >>>>>>>>> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, >>>>>>>>> NaN }, so >>>>>>>>> the result is 5. >>>>>>>>> >>>>>>>>> Agreed. just verified by putting back the earlier insertionSort >>>>>> function. >>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we >>>>>>>>> get as a >>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN. >>>>>>>>> >>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your >>>>>> insertionSort >>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for >>>>>>>> transformation >>>>>> of >>>>>>>> NaN to +/-Inf >>>>>>> I'm fine with it, as long as we clearly state it in the >>>>>>> NaNStrategy >>>>>>> documentation, saying somethig like: >>>>>>> >>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively >>>>>>> *replaced* by >>>>>>> +/- infinity, hence the results will be computed as if >>>>>>> infinities were >>>>>>> there in the first place. >>>>>> -1 - this is mixing concerns. NaNStrategy exists for one specific >>>>>> purpose - to turn extend the ordering on doubles a total ordering. >>>>>> Strategies prescribe only how NaNs are to be treated in the >>>>>> ordering. Side effects on the input array don't make sense in >>>>>> this >>>>>> context. The use case for which this class was created was rank >>>>>> transformations. Returning infinities in rank transformations >>>>>> would >>>>>> blow up computations in these classes. If a floating point >>>>>> computations need to transform NaNs, the specific stats / other >>>>>> classes that do this transformation should perform and document >>>>>> the >>>>>> behavior. >>>>>> >>>>>> Phil >>>>> OK. Agreed realized it later. Hence i have not touched >>>>> NaNStrategy in my >>>>> patch(MATH-1132) . Now i have added a separate transformer to >>>>> transform NaNs >>>>> but it uses NanStrategy. Please let me know on this as this >>>>> trasnformation >>>>> itself >>>>> can be used in different places. >>>> >>>> Honestly, I don't see the value of this. I would be more favorable >>>> to an addition to MathArrays supporting NaN (or infinity) filtering >>>> / transformation. Several years back we made the decision to just >>>> let NaNs propagate in computations, which basically means that if >>>> you feed NaNs to [math] you are going to get NaNs out in results. >>>> If we do get into the business of filtering NaNs (or infinities for >>>> that matter), I think it is probably best to do that in MathArrays >>>> or somesuch - i.e., don't decorate APIs with NaN or >>>> infinity-handling handling descriptions. That would be a huge task >>>> and would likely end up bleeding implementation detail into the >>>> APIs. As I said above, NaNStrategy has to exist for rank >>>> transformations to be well-defined. NaN-handling in floating point >>>> computations is defined and as long as we fully document what our >>>> algorithms do, I think it is fair enough to "let the NaNs fall where >>>> they may" - which basically means users need to do the filtering / >>>> transformation themselves. >>> >>> So, do I understand correctly that it's OK to add the "remove" and >>> "replace" utility methods (cf. MATH-1130) in "MathArrays"? >>> Or does this thread conclude that we don't have a use for this within >>> Commons Math? >> >> You raise a good point here. Personally, I don't see an internal >> use and if we are sticking with MathArrays including only things we >> have internal use for, I would say no, don't include them. > > A third way to look at it would be that since some algorithms require > being passed "clean" data (e.g. without NaNs), we could provide some > simple means to do so. A user could then write, e.g. > --- > double[] data = { 1, Double.NaN, 3, 6, 5 }; > > Percentile p = new Percentile(); > double q = p.evaluate(MathArrays.replace(data, Double.NaN, > Double.POSITIVE_INFINITY), 0.1); > --- > > Then, if we provide that utility, is it still necessary to complicate the > Percentile > code with a NaNStrategy parameter?
I may be missing something, but it seems to me that percentile does need a NaNStrategy in the presence if NaNs like the other order statistics do (basically to define their position In the ordering). It doesn't logically need more than that. I would be ok defaulting the strategy or even returning NaN in the presence of NaNs (or whenever interpolation involves a NaN). All of this can be documented. I still don't see a within-math use for recoding methods; though I do see the value from a user perspective. The problem is if we open this door, we risk MathArrays becoming a mini commons lang. Phil > > > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org