Le 18/06/2014 14:32, Gilles a écrit : > Hello Luc. > >>> >>> See >>> https://issues.apache.org/jira/browse/MATH-1129 >>> >>> The problem reported was due to the sorting procedure not behaving >>> correctly in the presence of NaN. >>> This procedure could be replaced by an equivalent method from the JDK: >>> java.util.Arrays.sort(double[],int,int) >>> >>> Any objection? >> >> If you imply removing the select, medianOf3 and partition methods, yes I >> would object. If you imply replacing only the insertionSort method used >> for small sub-arrays, then I almost agree. > > Issue 1129 concerns the latter: See the comment in "insertionSort" in the > current code. > > However, for the former, you should really have a look at MATH-1120 > https://issues.apache.org/jira/browse/MATH-1120 > The proposed patch indeed touches those methods.
So I think it is worth fixing MATH-1120 first, with the NaNStrategy and go back to MATH-1129 afterwards, to see if it still applies of if MATH-1120 also fixed it. I will have a look at the proposed patch, including your comments. > >> >> However, I'm not sure this would be sufficient as the sort is done >> partially in all the methods above. The former ones are used to split >> the big array into smaller ones and sorting only the sub-arrays that are >> needed to compute the percentile (it is really a selection algorithm, >> not a full-sort algorithm). So I guess we should look at all the methods >> at once to ensure proper handling of NaNs. The trick is that all >> comparisons involving NaN return false (<, <=, >, >=, ==, !=), >> regardless of the NaN being at the left hand side or right hand side of >> the comparison (so we get the funny consequence that if a is a NaN, the >> test a == a evaluates to false ...). > > As of r1603217, "insertionSort" seems to behave correctly (at least on the > example reported in MATH-1129). Yes, but it would be fed with garbage as the former methods fail (typically medianOf3 does not handle NaNs properly), so I guess in the general case it would clearly fail. > >> >> The fast an insertion sort is used at the end is because it is very >> simply and efficient for small arrays, which is exactly what happens >> here: the array part has MIN_SELECT_SIZE elements or less, i.e. 15 >> elements or less. > > That was my question, implicitly: is it worth not using the JDK, or IOW, > is "insertionSort" so much faster than "java.util.Arrays.sort" that it is > worth maintaining a custom implementation? Yes it is as long as it is kept simple. Quicksort is not that good at small arrays, and this method can be called a huge number of times on numerous small arrays, to it should really consider this. > >> >> As I wrote this selection algorithm, I could do the check if you want. > > What check? Checking the algorithms and fixing them using the strategy "NaN is larger everything", as already stated in the javadoc. But as MATH-1120 proposes a more complete solution and allows other strategies, I'll look at it first. best regards, Luc > > > Regards, > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org