[GitHub] [beam] iindyk commented on a change in pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

GitBox Tue, 11 Aug 2020 17:30:46 -0700


iindyk commented on a change in pull request #12420:
URL: https://github.com/apache/beam/pull/12420#discussion_r468937788




##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -398,8 +424,8 @@ class ApproximateQuantilesCombineFn(CombineFn):
   http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1
   &type=pdf
 
-  The default error bound is (1 / N), though in practice the accuracy
-  tends to be much better.
+  The default error bound is (1 / N) for uniformly distributed data and 1e-2 
for

Review comment:
       I think N rarely exceeds 100 and most of the time is <10, so I changed 
to 1e-2 to increase accuracy for the weighted case. This was done to reflect 
the fact that, although we guarantee this error bound, the bound itself is on 
the weight concentrated between the returned value and the actual quantile. If 
the weights are uneven, then there may potentially be a lot of values between 
those. 
   But looking at it again, I think you're right and it would make more sense 
to set it to min(1e-2, 1/N).

##########
File path: sdks/python/apache_beam/transforms/stats.py
##########
@@ -263,30 +265,38 @@ class Globally(PTransform):
 
     Args:
       num_quantiles: number of elements in the resulting quantiles values list.
+      weighted: (optional) if set to True, the transform returns weighted
+        quantiles. The input PCollection is then expected to contain tuples of
+        input values with the corresponding weight.

Review comment:
       Done. Although I wasn't able to find a pattern for examples (like for 
DocTest). Let me know if I should change the format




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [beam] iindyk commented on a change in pull request #12420: Extending ApproximateQuantiles functionality to deal with non-uniform weights.

Reply via email to