https://bugs.documentfoundation.org/show_bug.cgi?id=154627
Bug ID: 154627
Summary: Inconsistencies in RANK() function in Calc
Product: LibreOffice
Version: 4.0.0.3 release
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: Calc
Assignee: [email protected]
Reporter: [email protected]
There seem to be some inconsistencies in the RANK() function in Calc as of LO
7.4.6 (in part since LO 4.0).
For the purpose of this report:
* Rank(Value;Data[;Type])
* L.Value is the minimum value in Data.
* G.Value is the maximum value in Data.
Sample file showing the problems
Attachment 186403.
A_ Columns B:C use RANK() on Data, with the respective Values taken directly
from the same Data range. Besides valid values within Data, there are also "a",
"1" (potential valid value, but not within the range of values in Data) and an
empty cell.
B_ Columns F:G use RANK.AVG().
C_ Column K is a set of possible values for the Value argument for RANK().
D_ Columns L:M use column K as Value for RANK(), with Data being A3:A7.
E_ Columns O:P are similar to "D_" but with duplicate values in Data, A3:A10.
F_ Columns S:T are like "D_" but with RANK.AVG().
G_ Columns V:W are like "F_" but with duplicate values in Data, A3:A10.
The following items are presented in increasing magnitude/importance of
problematic results (starting from "no problem" towards "serious problem").
0.1. When Value is "string" or refers to a string, then the result is #Value!.
This seems adequate.
0.2. When Value is "string" or refers to a string, and it is included in Data,
then the Ranks corresponding to (other) valid values in Data discard the
existence of the string value.
This seems adequate.
0.3. When Value refers to an empty cell, then the result is #Value!.
This seems adequate; although someone could argue in favor of a different
result such as #N/A (as other spreadsheet tools do).
1. When Value < L.Value, or when Value > G.Value, then the result is
#Value!.
IMO, it should be #N/A, as in other spreadsheet software tools.
For Values 1 and 10 in attachment 186403 (column K), the result should be
#N/A.
2. When Type is a number <> zero, and when 'L.Value < Value > G.Value',
((and) Value is not listed in Data), the result is "-1" (without quotation
marks).
I couldn't find official Release Notes on this important change, which can
affect further calculations, preventing interoperability with other spreadsheet
tools. This (distinct) result is not seen in LO 3.3, (nor in AOO, GSheets,
Excel). I would had expected some documentation (including the reasoning,
especially when it is different than other spreadsheet tools). Using other
spreadsheet software tools, this results in #N/A.
In attachment 186403, when Type is 1 and Value is between 2 and 9 (but not
them), the result (since LO 4.0) is "-1" (columns M and P). It should had been
#N/A.
3. When Type = 0 (zero), and when 'L.Value < Value > G.Value', ((and) Value
is not listed in Data), the result is still a positive integer number, which is
INCORRECT!
Similar case as point 2, but with Type = 0 (columns L and (letter) O in
attachment 186403). The result (positive numbers) is simply incorrect, and
worse than point 2. This should had been #N/A.
4. When using RANK.AVG() instead of RANK(), when 'L.Value < Value > G.Value',
((and) Value is not listed in Data), the result is still a positive number,
which is INCORRECT! Moreover, when there are duplicate values in Data, the
results can fall beyond the maximum rank.
In attachment 186403 (columns S, T, V and W), the maximum rank should had
been 5 (S:T) and 7 (V:W) respectively. Here we not only have incorrect positive
results where there shouldn't be (as in point 3) but the results can be greater
than 7.
The functions RANK(), RANK.EQ() (equivalent to RANK()) and RANK.AVG() need a
review.
--
You are receiving this mail because:
You are the assignee for the bug.