[ https://issues.apache.org/jira/browse/BEAM-6694?focusedWorklogId=295566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295566 ]
ASF GitHub Bot logged work on BEAM-6694: ---------------------------------------- Author: ASF GitHub Bot Created on: 15/Aug/19 17:00 Start Date: 15/Aug/19 17:00 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9153: [BEAM-6694] Added Approximate Quantile Transfrom on Python SDK URL: https://github.com/apache/beam/pull/9153#discussion_r314404234 ########## File path: sdks/python/apache_beam/transforms/stats_test.py ########## @@ -345,5 +353,273 @@ def test_approximate_unique_globally_by_error_with_skewed_data(self): pipeline.run() +class ApproximateQuantilesTest(unittest.TestCase): + _kv_data = [("a", 1), ("a", 2), ("a", 3), ("b", 1), ("b", 10), ("b", 10), + ("b", 100)] + _kv_str_data = [("a", "a"), ("a", "a"*2), ("a", "a"*3), ("b", "b"), + ("b", "b"*10), ("b", "b"*10), ("b", "b"*100)] + + @staticmethod + def _quantiles_matcher(expected): + l = len(expected) + + def assert_true(exp): + if not exp: + raise BeamAssertException('%s Failed assert True' % repr(exp)) + + def match(actual): + actual = actual[0] + for i in range(l): + if isinstance(expected[i], list): + assert_true(expected[i][0] <= actual[i] <= expected[i][1]) + else: + equal_to([expected[i]])([actual[i]]) + + return match + + @staticmethod + def _approx_quantile_generator(size, num_of_quantiles, absoluteError): + quantiles = [0] + k = 1 + while k < num_of_quantiles - 1: + expected = (size - 1) * k / (num_of_quantiles - 1) + quantiles.append([expected - absoluteError, expected + absoluteError]) + k = k + 1 + quantiles.append(size - 1) + return quantiles + + def test_quantiles_globaly(self): + with TestPipeline() as p: + pc = p | Create(range(101)) Review comment: Let's change this to an actual list instead of a range object. The reason, if in the future we convert this to a validates runner test, we would need to do the same `from builtins import range` in the workers. Suggestion: `pc = p | Create([x for x in range(101)])` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 295566) Time Spent: 8h 40m (was: 8.5h) > ApproximateQuantiles transform for Python SDK > --------------------------------------------- > > Key: BEAM-6694 > URL: https://issues.apache.org/jira/browse/BEAM-6694 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core > Reporter: Ahmet Altay > Assignee: Shehzaad Nakhoda > Priority: Minor > Time Spent: 8h 40m > Remaining Estimate: 0h > > Add PTransforms for getting an idea of a PCollection's data distribution > using approximate N-tiles (e.g. quartiles, percentiles, etc.), either > globally or per-key. > It should offer the same API as its Java counterpart: > https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java -- This message was sent by Atlassian JIRA (v7.6.14#76016)