AlexanderSaydakov commented on PR #14334:
URL: https://github.com/apache/druid/pull/14334#issuecomment-1560339901
We hesitated for some time, but finally decided that inclusive mode is a bit
better. This is a major version change with some API incompatibility, so, if
ever, this is the right time for the change.
The difference is in the definition of rank. Suppose we are analyzing a
distribution of some items exactly. The only thing required is a comparator of
items ("less than" operator). We sort the items and define the rank of an item
as the proportion of the whole distribution strictly less than that item in the
exclusive mode or less than or equal to that item in the inclusive mode. It
seems that the inclusive mode is more common in the literature and is slightly
more well-behaved in some edge cases.
To illustrate the difference, suppose we have just one item. Its rank in
inclusive mode is 1, but 0 in exclusive mode. But with millions of items the
difference in rank will be tiny, and, most probably, negligible. If we do a
histogram or partitioning, some items on the edges can fall into the bin or
partition on the right or on the left depending on the mode.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]