iindyk commented on a change in pull request #12420: URL: https://github.com/apache/beam/pull/12420#discussion_r468937788
########## File path: sdks/python/apache_beam/transforms/stats.py ########## @@ -398,8 +424,8 @@ class ApproximateQuantilesCombineFn(CombineFn): http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1 &type=pdf - The default error bound is (1 / N), though in practice the accuracy - tends to be much better. + The default error bound is (1 / N) for uniformly distributed data and 1e-2 for Review comment: I think N rarely exceeds 100 and most of the time is <10, so I changed to 1e-2 to increase accuracy for the weighted case. This was done to reflect the fact that, although we guarantee this error bound, the bound itself is on the weight concentrated between the returned value and the actual quantile. If the weights are uneven, then there may potentially be a lot of values between those. But looking at it again, I think you're right and it would make more sense to set it to min(1e-2, 1/N). ########## File path: sdks/python/apache_beam/transforms/stats.py ########## @@ -263,30 +265,38 @@ class Globally(PTransform): Args: num_quantiles: number of elements in the resulting quantiles values list. + weighted: (optional) if set to True, the transform returns weighted + quantiles. The input PCollection is then expected to contain tuples of + input values with the corresponding weight. Review comment: Done. Although I wasn't able to find a pattern for examples (like for DocTest). Let me know if I should change the format ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org