[jira] [Comment Edited] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula

Gilles (JIRA) Tue, 17 Jun 2014 09:55:27 -0700

    [ 
https://issues.apache.org/jira/browse/MATH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033121#comment-14033121
 ]


Gilles edited comment on MATH-1120 at 6/17/14 4:53 PM:
-------------------------------------------------------

bq. [...] eliminated AtomicInteger [...]

Good. ;)

But I noticed new little problems.

Why not have two separate methods: "recodeNaN" and "removeNaN"?
The current "recodeNaNs" mixes both, which I think is not necessary.

Method "getWorkArray" uses exceptions to control flow. Why not directly proceed 
with handling (i.e. without raising an exception first)?
Like in the following (untested) code (using the new methods proposed above).
{code}
switch (nanStrategy) {
  case MAXIMAL:
       return recodeNaN(temp, Double.POSITIVE_INFINITY);
  case MINIMAL:
       return recodeNaN(temp, Double.NEGATIVE_INFINITY);
  case FAILED:
       checkNotNaN(temp); // Should throw when it encounters a NaN
       return temp;
  case REMOVED:
      return removeNaN(temp);
  default:
      return temp;
}
{code}

At first sight, I do not think that it is a good idea to have several _default_ 
filtering strategies (i.e. an implicit startegy that varies with the {{Type}}).
I'm not even sure that filtering should happen at this point (i.e. in 
{{Percentile}}). IMHO, if an algorithm cannot handle some values, they should 
have been filtered out first.
It could be construed that an in-place filtering could be more space-efficient, 
but that's not the case here since "recodeNaNs" allocates a new array.
These design decisions should be brought to the "dev" ML.



was (Author: erans):
bq. [...] eliminated AtomicInteger [...]

Good. ;)

But I noticed new little problems.

Why not have two separate methods: "recodeNaN" and "removeNaN"?
The current "recodeNaNs" mixes both, which I think is not necessary.

Method "getWorkArray" uses exceptions to control flow. Why not directly proceed 
with handling (i.e. without raising an exception first)?
Like in the following (untested) code (using the new methods proposed above).
{code}
switch (nanStrategy) {
  case MAXIMAL:
       return recodeNaN(temp, Double.POSITIVE_INFINITY);
  case MAXIMAL:
       return recodeNaN(temp, Double.NEGATIVE_INFINITY);
  case FAILED:
       checkNotNaN(temp); // Should throw when it encounters a NaN
       return temp;
  case REMOVED:
      return removeNaN(temp);
  default:
      return temp;
}
{code}

At first sight, I do not think that it is a good idea to have several _default_ 
filtering strategies (i.e. an implicit startegy that varies with the {{Type}}).
I'm not even sure that filtering should happen at this point (i.e. in 
{{Percentile}}). IMHO, if an algorithm cannot handle some values, they should 
have been filtered out first.
It could be construed that an in-place filtering could be more space-efficient, 
but that's not the case here since "recodeNaNs" allocates a new array.
These design decisions should be brought to the "dev" ML.


> Need Percentile computations that can be matched with standard spreadsheet 
> formula
> ----------------------------------------------------------------------------------
>
>                 Key: MATH-1120
>                 URL: https://issues.apache.org/jira/browse/MATH-1120
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.2
>            Reporter: Venkatesha Murthy TS
>              Labels: Percentile
>             Fix For: 4.0
>
>         Attachments: excel-percentile-patch, 
> percentile-with-estimation-patch, r-output.txt
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The current Percentile implementation assumes and hard-codes the quantile pth 
> position as 
> p * (N+1)/100 and provides a kth selected value.
> However if we need to verify compare/contrast with standard statistical tools 
> such as say MS Excel; it would be good to provide an extensible way of 
> morphing this selection of position than hard code.
> For example in order to generate the percentile closely matching with MS 
> Excel the position required may be [p*(N-1)/100]+1.
> Please let me know if i could submit this as a patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (MATH-1120) Need Percentile computations that can be matched with standard spreadsheet formula

Reply via email to