felipecrv commented on code in PR #37380:
URL: https://github.com/apache/arrow/pull/37380#discussion_r1312038163


##########
python/pyarrow/_compute.pyx:
##########
@@ -2145,7 +2145,8 @@ class QuantileOptions(_QuantileOptions):
     Parameters
     ----------
     q : double or sequence of double, default 0.5
-        Quantiles to compute. All values must be in [0, 1].
+        Probability levels of the quantiles to compute. All values must be in

Review Comment:
   "Probability level" is new terminology and could mean anything -- that is 
why it seems to work as a replacement for the ambiguous "quantile" here.
   
   "quantile" can refer to the calculated value and the desired quantile as a 
number between `0.0` and `1.0`. Isn't the comment saying they are between 0.0 
and 1.0 enough to disambiguate?
   
   The original t-Digest paper refers to these as quantiles [1]
   
   > With the t-digest, accuracy for estimating the q quantile is nearly 
constant relative to q(1 − q). This is in contrast to earlier algorithms which 
had errors independent of q. The relative error bound of the t-digest is 
convenient when computing quantiles for q near 0 or 1 as is commonly required. 
As with the Q-digest algorithm, the accuracy/size trade-off for the t-digest 
can be controlled by setting a single compression parameter δ with the amount 
of memory required proportional only to Θ(δ).
   
   I see you are referring to R's `quantile` documentation that uses "probs".  
I would argue that they only did it because passing "quantile" to a "quantile" 
function would be weird. In the case of t-Digest, calling this argument 
"quantile" is more informative than "probability level". Probability of what?
   
   If you really want to rename this to something that diverges from the the 
original t-Digest paper, I suggest the term "fractions" that has been adoped by 
Redis [2] and Druid [3].
   
   [1] https://arxiv.org/pdf/1902.04023.pdf
   [2] https://redis.io/docs/data-types/probabilistic/t-digest/
   [3] 
https://druid.apache.org/docs/latest/development/extensions-contrib/tdigestsketch-quantiles/
   
   
   
    



##########
python/pyarrow/_compute.pyx:
##########
@@ -2145,7 +2145,8 @@ class QuantileOptions(_QuantileOptions):
     Parameters
     ----------
     q : double or sequence of double, default 0.5
-        Quantiles to compute. All values must be in [0, 1].
+        Probability levels of the quantiles to compute. All values must be in

Review Comment:
   "Probability level" is new terminology and could mean anything -- that is 
why it seems to work as a replacement for the ambiguous "quantile" here.
   
   "quantile" can refer to the calculated value and the desired quantile as a 
number between `0.0` and `1.0`. Isn't the comment saying they are between 0.0 
and 1.0 enough to disambiguate?
   
   The original t-Digest paper refers to these as quantiles [1]
   
   > With the t-digest, accuracy for estimating the q quantile is nearly 
constant relative to q(1 − q). This is in contrast to earlier algorithms which 
had errors independent of q. The relative error bound of the t-digest is 
convenient when computing quantiles for q near 0 or 1 as is commonly required. 
As with the Q-digest algorithm, the accuracy/size trade-off for the t-digest 
can be controlled by setting a single compression parameter δ with the amount 
of memory required proportional only to Θ(δ).
   
   I see you are referring to R's `quantile` documentation that uses "probs".  
I would argue that they only did it because passing "quantile" to a "quantile" 
function would be weird. In the case of t-Digest, calling this argument 
"quantile" is more informative than "probability level". Probability of what?
   
   If you really want to rename this to something that diverges from the the 
original t-Digest paper, I suggest the term "fractions" that has been adoped by 
Redis [2] and Druid [3].
   
   [1] https://arxiv.org/pdf/1902.04023.pdf
   [2] https://redis.io/docs/data-types/probabilistic/t-digest/
   [3] 
https://druid.apache.org/docs/latest/development/extensions-contrib/tdigestsketch-quantiles/
   
   
   
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to