On 7/13/14, 2:40 AM, Luc Maisonobe wrote: > Hi all, > > Le 01/07/2014 08:52, Luc Maisonobe a écrit : >> Le 30/06/2014 22:00, Phil Steitz a écrit : >>> On 6/30/14, 12:33 PM, Luc Maisonobe wrote: >>>> Le 30/06/2014 19:15, Phil Steitz a écrit : >>>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gil...@harfang.homelinux.org> >>>>>> wrote: >>>>>> >>>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote: >>>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles >>>>>>>> <gil...@harfang.homelinux.org> wrote: >>>>>>>> >>>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote: >>>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote: >>>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: >>>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun, >>>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz >>>>>>>>>>>> <phil.ste...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le >>>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit : >>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe >>>>>>>>>>>>>>> <l...@spaceroots.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> Hi Venkat, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit >>>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc >>>>>>>>>>>>>>>>> Maisonobe <l...@spaceroots.org> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> While looking further in Percentile class >>>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem >>>>>>>>>>>>>>>>>> in the current implementation. >>>>>>>>>>>>>>>>>> NaNStrategy.FIXED >>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of >>>>>>>>>>>>>>>>>> the KthSelector.select method, a call to >>>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of >>>>>>>>>>>>>>>>>> the small sub-array. What is really >>>>>>>>>>>>>>>>>> implemented here is therefore closer to >>>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. >>>>>>>>>>>>>>>>>> This always occur in the test cases because >>>>>>>>>>>>>>>>>> they use very short arrays, and we >>>>>>>>>>>>>>>>>> directly >>>>>>>>>>>>> switch to >>>>>>>>>>>>>>>>>> this part of the select method. >>>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If >>>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you >>>>>>>>>>>>>>>>> need something to indicate beyond this for >>>>>>>>>>>>>>>>> NaN. >>>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and >>>>>>>>>>>>>>>> handled them appropriately, hoping not to trash >>>>>>>>>>>>>>>> performances too much. >>>>>>>>>>>>>>> Agreed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What is the test input you see an issue and >>>>>>>>>>>>>>>>> what is the actual error you are seeing. >>>>>>>>>>>>>>>>> Please share the test case. >>>>>>>>>>>>>>>> Just look at >>>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The >>>>>>>>>>>>>>>> first check in the test corresponds to a >>>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and >>>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number >>>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly >>>>>>>>>>>>>>>> one of the entries: the one at index 5 in the >>>>>>>>>>>>>>>> final array. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, >>>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the >>>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be >>>>>>>>>>>>>>>> modified at all in the selection algorithm and >>>>>>>>>>>>>>>> the result for 50% should be the first NaN of >>>>>>>>>>>>>>>> the array, at index 5. In fact, due to the >>>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into { >>>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the >>>>>>>>>>>>>>>> result is 5. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Agreed. just verified by putting back the >>>>>>>>>>>>>>>> earlier insertionSort >>>>>>>>>>>>> function. >>>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile >>>>>>>>>>>>>>>> above 67%, we get as a result >>>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of >>>>>>>>>>>>>>>> Double.NaN. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged >>>>>>>>>>>>>>>> behaviour with your >>>>>>>>>>>>> insertionSort >>>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be >>>>>>>>>>>>>>> allowed for transformation >>>>>>>>>>>>> of >>>>>>>>>>>>>>> NaN to +/-Inf >>>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in >>>>>>>>>>>>>> the NaNStrategy documentation, saying somethig >>>>>>>>>>>>>> like: >>>>>>>>>>>>>> >>>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are >>>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the >>>>>>>>>>>>>> results will be computed as if infinities were >>>>>>>>>>>>>> there in the first place. >>>>>>>>>>>>> -1 - this is mixing concerns. NaNStrategy exists for >>>>>>>>>>>>> one specific purpose - to turn extend the ordering on >>>>>>>>>>>>> doubles a total ordering. Strategies prescribe only >>>>>>>>>>>>> how NaNs are to be treated in the ordering. Side >>>>>>>>>>>>> effects on the input array don't make sense in this >>>>>>>>>>>>> context. The use case for which this class was >>>>>>>>>>>>> created was rank transformations. Returning >>>>>>>>>>>>> infinities in rank transformations would blow up >>>>>>>>>>>>> computations in these classes. If a floating point >>>>>>>>>>>>> computations need to transform NaNs, the specific >>>>>>>>>>>>> stats / other classes that do this transformation >>>>>>>>>>>>> should perform and document the behavior. >>>>>>>>>>>>> >>>>>>>>>>>>> Phil >>>>>>>>>>>> OK. Agreed realized it later. Hence i have not >>>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have >>>>>>>>>>>> added a separate transformer to transform NaNs but it >>>>>>>>>>>> uses NanStrategy. Please let me know on this as this >>>>>>>>>>>> trasnformation itself can be used in different places. >>>>>>>>>>> Honestly, I don't see the value of this. I would be more >>>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or >>>>>>>>>>> infinity) filtering / transformation. Several years back >>>>>>>>>>> we made the decision to just let NaNs propagate in >>>>>>>>>>> computations, which basically means that if you feed NaNs >>>>>>>>>>> to [math] you are going to get NaNs out in results. If we >>>>>>>>>>> do get into the business of filtering NaNs (or infinities >>>>>>>>>>> for that matter), I think it is probably best to do that >>>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs >>>>>>>>>>> with NaN or infinity-handling handling descriptions. >>>>>>>>>>> That would be a huge task and would likely end up >>>>>>>>>>> bleeding implementation detail into the APIs. As I said >>>>>>>>>>> above, NaNStrategy has to exist for rank transformations >>>>>>>>>>> to be well-defined. NaN-handling in floating point >>>>>>>>>>> computations is defined and as long as we fully document >>>>>>>>>>> what our algorithms do, I think it is fair enough to "let >>>>>>>>>>> the NaNs fall where they may" - which basically means >>>>>>>>>>> users need to do the filtering / transformation >>>>>>>>>>> themselves. >>>>>>>>>> So, do I understand correctly that it's OK to add the >>>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in >>>>>>>>>> "MathArrays"? Or does this thread conclude that we don't >>>>>>>>>> have a use for this within Commons Math? >>>>>>>>> You raise a good point here. Personally, I don't see an >>>>>>>>> internal use and if we are sticking with MathArrays including >>>>>>>>> only things we have internal use for, I would say no, don't >>>>>>>>> include them. >>>>>>>> A third way to look at it would be that since some algorithms >>>>>>>> require being passed "clean" data (e.g. without NaNs), we could >>>>>>>> provide some simple means to do so. A user could then write, >>>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 }; >>>>>>>> >>>>>>>> Percentile p = new Percentile(); double q = >>>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN, >>>>>>>> Double.POSITIVE_INFINITY), 0.1); --- >>>>>>>> >>>>>>>> Then, if we provide that utility, is it still necessary to >>>>>>>> complicate the Percentile code with a NaNStrategy parameter? >>>>>>> I may be missing something, but it seems to me that percentile >>>>>>> does need a NaNStrategy in the presence if NaNs like the other >>>>>>> order statistics do (basically to define their position In the >>>>>>> ordering). It doesn't logically need more than that. I would >>>>>>> be ok defaulting the strategy or even returning NaN in the >>>>>>> presence of NaNs (or whenever interpolation involves a NaN). All >>>>>>> of this can be documented. >>>>>> I am _surely_ missing something because I don't figure out how NaNs >>>>>> can be useful when computing those statistics... :-} >>>>> Good point. Basically the different strategies allow you to >>>>> effectively ignore or work around them so stats can be well-defined. >>>>> Without NaNStrategy, order statistics including NaNs are undefined. >>>> The problem with Percentile is that the elements are both reordered >>>> *and* used when interpolating between two values. Tor the rank methods, >>>> only reordering was needed so the various NaNStrategies could be >>>> implemented by replacing at will with +/- infinity and a classical sort >>>> would do the ordering for us. >>>> >>>> Here, we do interpolate on the values, so if we first replace NaN with >>>> +infinity to put them at the end, we may interpolate between a finite >>>> value and +inf and have +inf as a result, whereas if the NaN "value" was >>>> preserved and some magic used to sort the NaN, we would end up >>>> interpolating with a NaN and get a NaN as a result. Setting up the magic >>>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already >>>> have a specific sort (in fact not a sort but a selection algorithm, >>>> which is the core of the class). Setting up the magic for FIXED seems >>>> more difficult. >>>> >>>> So I guess implementing NaNStrategies may be done in different ways >>>> depending on how they will be used by the calling algorithm. I don't yet >>>> know if we can set up a general method to be used for all statistics. >>>> Looking only at Percentile, replacement with +/- infinity already seems >>>> to not be an appropriate solution. >>> I don't think we should try to redefine algorithms so they can >>> handle NaNs. When you do floating point computations with NaNs, you >>> get NaNs. That should be understood to be the contract. For rank >>> statistics, such as Percentile, we need to extend the ordering on >>> doubles to include them in ordering for the algorithm to be >>> well-defined. If we subsequently do floating point computations >>> with the NaNs, the result should be NaN. I think we should either >>> just simplify things and say we *always* return NaN whenever NaNs >>> are present for Percentile, or do the following: >>> >>> 0) use the prescribed NaN strategy to determine the ordering - so >>> NaNs end up either at the bottom, at the top or in their original >>> position (FIXED). >>> 1) if interpolation involves a NaN element, the result returned is NaN. >> This is exactly what I say. If NaNs are in, NaNs should be out. As for >> now we do replace them, we get infinities instead of NaNs. I would >> prefer getting NaNs there as it is more consistent with the purpose of >> NaNs in floating point encoding. >> >> So I think we agree: NaNs should not be changed, but they should be >> reordered. Perhaps we could have something simple like NaNStrategy >> having a method: >> >> boolean shouldSwap(int i1, double d1, int i2, double i2) >> >> that would use both the index and values of the element to check if they >> should be swapped or not. This method would be used both for traditional >> sorting for ranking and in selection algorithms for percentile. I first >> thought we should simply have NaN implement Comparator<Double> but this >> would not work for FIXED, whereas the above method would work (simply >> returning false as soon as there is a NaN, regardless of it being at a >> lower or upper index). >> >> I will take a look at this option. > This does not work as is. The problem is lack of transitivity. Suppose > that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN. > > Then if during the sort/selection we compare elements at indices i and k > first, we will swap them, which is correct behaviour regardless of the > NaN strategy used. If we compare elemntes at indices i and j first and > then elements at indices j and k and we use FIXED NaNStrategy, then > we won't swap the two elements and the middle NaN would induced a > non-globally sorted array. I think FIXED should act as a river flowing > over rocks : the water (non-NaN elements) would move freely around and > be sorted, and the rocks (NaN elements) would stay were they are but do > not prevent water flow.
That is what FIXED is supposed to mean. If the original array is x_0, ..., x_n and x_i is NaN, then after applying the FIXED NaNStrategy to sort the array, x_i will still be NaN. If this is inconvenient to support for Percentile, I would be fine with making selection of this NaNStrategy illegal - i.e, throwing IAE if it is specified. As I said before, I would also be OK with making the presence of NaN values in the input trigger (documented) IAE. Phil > > I have found another way to handle this, using an indirection array that > could "jump" over NaNs for FIXED strategy or push them to either end for > MINIMAL or MAXIMAL strategy. > > I will try this in the next few days, as time permit. > > best regards, > Luc > >> best regards, >> Luc >> >>> Phil >>>> best regards, >>>> Luc >>>> >>>>> Phil >>>>> >>>>>>> I still don't see a within-math use for recoding methods; though >>>>>>> I do see the value from a user perspective. The problem is if we >>>>>>> open this door, we risk MathArrays becoming a mini commons lang. >>>>>> If Commons Lang already provides a feature, then we indeed don't >>>>>> need to offer the same within CM (if there is no internal use). Can >>>>>> someone confirm? >>>>>> >>>>>> >>>>>> Otherwise we could define a new inner interface in MathArrays, >>>>>> similar to "MathArrays.Function": --- public class MathArrays { >>>>>> >>>>>> // ... >>>>>> >>>>>> public interface Transformer { /** Transform the whole array */ >>>>>> double[] transform(double[] input); /** Transform the array in the >>>>>> interval [from, to) and return the whole array */ double[] >>>>>> transform(double[] input, int from, int to); /** Transform the >>>>>> array in the interval [from, to) and return the transformed >>>>>> sub-array */ double[] transformAndSplice(double[] input, int from, >>>>>> int to); } } --- >>>>>> >>>>>> Then we can provide "replace" and "remove" transformers through >>>>>> factory methods (example/untested code): --- public class >>>>>> MathArrays { >>>>>> >>>>>> // ... >>>>>> >>>>>> public static Transformer createReplaceTransformer(final double >>>>>> old, final double replace) { return new Transformer() { public >>>>>> double[] transform(double[] input) { final int len = input.length; >>>>>> final double[] output = new double[len]; for (int i = 0; i < len; >>>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? >>>>>> replace : input[i]; } return output; } >>>>>> >>>>>> // etc. }; } } --- >>>>>> >>>>>> >>>>>> Gilles >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> >>>>>> >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>> --------------------------------------------------------------------- >>>>> >>>>> >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org