On Sun, Jun 29, 2014 at 10:55 PM, Phil Steitz <phil.ste...@gmail.com> wrote:
> On 6/29/14, 9:48 AM, venkatesha murthy wrote: > > On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <phil.ste...@gmail.com> > wrote: > > > >> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: > >>> Le 25/06/2014 06:05, venkatesha murthy a écrit : > >>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <l...@spaceroots.org> > >> wrote: > >>>>> Hi Venkat, > >>>>> > >>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit : > >>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org > > > >>>>> wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>>> While looking further in Percentile class for MATH-1120, I have > found > >>>>>>> another problem in the current implementation. NaNStrategy.FIXED > >> should > >>>>>>> leave the NaNs in place, but at the end of the KthSelector.select > >>>>>>> method, a call to Arrays.sort moves the NaNs to the end of the > small > >>>>>>> sub-array. What is really implemented here is therefore closer to > >>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in > the > >>>>>>> test cases because they use very short arrays, and we directly > >> switch to > >>>>>>> this part of the select method. > >>>>>> Are NaNs considered higher than +Inf ? > >>>>>> If MAXIMAL represent replacing for +inf ; you need something to > >>>>>> indicate beyond this for NaN. > >>>>> Well, we can just keep the NaN themselves and handled them > >>>>> appropriately, hoping not to trash performances too much. > >>>>> > >>>> Agreed. > >>>> > >>>>>> What is the test input you see an issue and what is the actual error > >>>>>> you are seeing. Please share the test case. > >>>>> Just look at PercentileTest.testReplaceNanInRange(). The first check > in > >>>>> the test corresponds to a Percentile configuration at 50% percentile, > >>>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the > >>>>> 50% percentile is exactly one of the entries: the one at index 5 in > the > >>>>> final array. > >>>>> > >>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So > >>>>> for the NaNStrategy.FIXED setting, it should not be modified at all > in > >>>>> the selection algorithm and the result for 50% should be the first > NaN > >>>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do* > >>>>> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so > >>>>> the result is 5. > >>>>> > >>>>> Agreed. just verified by putting back the earlier insertionSort > >> function. > >>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a > >>>>> result Double.POSITIVE_INFINITY instead of Double.NaN. > >>>>> > >>>>> If we agree to leave FIXED as unchanged behaviour with your > >> insertionSort > >>>> code; then atleast MAXIMAL/MINIMAL should be allowed for > transformation > >> of > >>>> NaN to +/-Inf > >>> I'm fine with it, as long as we clearly state it in the NaNStrategy > >>> documentation, saying somethig like: > >>> > >>> When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by > >>> +/- infinity, hence the results will be computed as if infinities were > >>> there in the first place. > >> -1 - this is mixing concerns. NaNStrategy exists for one specific > >> purpose - to turn extend the ordering on doubles a total ordering. > >> Strategies prescribe only how NaNs are to be treated in the > >> ordering. Side effects on the input array don't make sense in this > >> context. The use case for which this class was created was rank > >> transformations. Returning infinities in rank transformations would > >> blow up computations in these classes. If a floating point > >> computations need to transform NaNs, the specific stats / other > >> classes that do this transformation should perform and document the > >> behavior. > >> > >> Phil > >> > > OK. Agreed realized it later. Hence i have not touched NaNStrategy in my > > patch(MATH-1132) . Now i have added a separate transformer to transform > NaNs > > but it uses NanStrategy. Please let me know on this as this > trasnformation > > itself > > can be used in different places. > > Honestly, I don't see the value of this. I would be more favorable > to an addition to MathArrays supporting NaN (or infinity) filtering > / transformation. Several years back we made the decision to just > let NaNs propagate in computations, which basically means that if > you feed NaNs to [math] you are going to get NaNs out in results. > If we do get into the business of filtering NaNs (or infinities for > that matter), I think it is probably best to do that in MathArrays > or somesuch - i.e., don't decorate APIs with NaN or > infinity-handling handling descriptions. That would be a huge task > and would likely end up bleeding implementation detail into the > APIs. As I said above, NaNStrategy has to exist for rank > transformations to be well-defined. NaN-handling in floating point > computations is defined and as long as we fully document what our > algorithms do, I think it is fair enough to "let the NaNs fall where > they may" - which basically means users need to do the filtering / > transformation themselves. > > Phil > OK. i see the point. > > > >>>>>>> When I implemented the method, I explicitly avoided calling > >> Arrays.sort > >>>>>>> because it is a general purpose method that is know to be efficient > >> only > >>>>>>> for arrays of at least medium size. In most cases, when the array > is > >>>>>>> small one falls back to a non-recursive method like a very simple > >>>>>>> insertion sort, which is faster for smaller arrays. > >>>>>> Please help me understand here; even java primitive Arrays.sort does > >>>>>> an insertion sort for less than 7 elements > >>>>>> (Refer sort1(double x[], int off, int len)) > >>>>>> So what is it that the custom insertion sort that does differently > or > >>>>>> is advantageous. Is it the value 15 elements? > >>>>> I don't see a reference to 7 elements, neither in the Java6 nor in > the > >>>>> Java 7 doc > >>>> Please take a look at the sort1 method where there is a first block in > >> the > >>>> code which clearly mentions len < 7 > >>>> /** > >>>> * Sorts the specified sub-array of doubles into ascending order. > >>>> */ > >>>> private static void sort1(double x[], int off, int len) { > >>>> // Insertion sort on smallest arrays > >>>> if (len < 7) { > >>>> for (int i=off; i<len+off; i++) > >>>> for (int j=i; j>off && x[j-1]>x[j]; j--) > >>>> swap(x, j, j-1); > >>>> return; > >>>> } > >>>> : > >>>> : > >>>> : code continues for the else part > >>>> > >>>> Also the grepcode url > >>>> < > >> > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29 > >>>> indicates the same > >>> OK, you were refering to a specific implementation. I was thinking in > >>> the general case. > >>> > >>>> (and in any case the doc explicitly states the algorithms > >>>>> explained are only implementation notes and are not part of the > >>>>> specification). > >>>>> > >>>> Yes its a part of comments anyways. > >>>> > >>>>> However, the most important part for now is the fact that we control > it > >>>>> and may be able to implement different NaN strategies. What we have > >>>>> currently fails. > >>>>> > >>>>> I agree on this and hence here is my take: > >>>> Leave FIXED as-is and use the earlier insertionSort code (just change > >> the > >>>> name to sort rather than hardcoding it as insertionsort) to handle the > >> case > >>>> you were mentioning > >>> If we leave FIXED it as it is done know, even with insertionSort we do > >>> not really control what happens. It is deterministic as the pivoting > and > >>> if/else structure is well defined (both in the selection part and in > the > >>> final sort for sub-arrays), but it is untrackable so we can't document > >> it. > >>>> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way > >> we > >>>> have covered both Inf and nan cases. > >>> OK. > >>> > >>>> Use REMOVED as default for all Percentile Estimation Types. (mostly > >>>> influenced by R here perhaps) > >>> This would be fine with me. I have concerns with FIXED only, the other > >>> strategies all seem quite realistic. > >>> > >>> Does anybody else has an advice for FIXED? What are the use case? > >>> > >>> best regards, > >>> Luc > >>> > >>>> best regards, > >>>>> Luc > >>>>> > >>>>>>> In the select > >>>>>>> operation, we know we have small sub-arrays at the call point. > Going > >>>>>>> back to the former insertionSort would recover the good behavior > for > >>>>>>> small arrays, but would in fact not be sufficient to really > >> implement a > >>>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave > like > >>>>>>> NaNStrategy.MAXIMAL but I did not try yet. > >>>>>>> > >>>>>>> My point here is rather: can we really and should we really > implement > >>>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to > >>>>>>> store the original position of the NaNs. It is quite cumbersome. > >>>>>>> > >>>>>>> I wonder what is the use case for NaNStrategy.FIXED at all. > >>>>>>> > >>>>>>> Going further and looking at the use cases for other NaNStrategy > >> values, > >>>>>>> the NaNs are replaced by +/- infinity before sorting, which is OK > for > >>>>>>> ranking as we only rely on the indices, but we use the values > >> themselves > >>>>>>> in Percentile. So sometimes, we end up with computing > interpolations > >>>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number > and > >>>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of > >> the > >>>>>>> NaN we should have retrieved without replacement. So here again, > I'm > >> not > >>>>>>> sure we are doing the right thing. > >>>>>>> > >>>>>>> What do you think? > >>>>>>> > >>>>>>> best regards, > >>>>>>> Luc > >>>>>>> > >>>>>>> > --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>>>> > >>>>>> > >>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>>> > >>>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>> For additional commands, e-mail: dev-h...@commons.apache.org > >>> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >