[ 
https://issues.apache.org/jira/browse/BEAM-6694?focusedWorklogId=296229&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-296229
 ]

ASF GitHub Bot logged work on BEAM-6694:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 16/Aug/19 11:05
            Start Date: 16/Aug/19 11:05
    Worklog Time Spent: 10m 
      Work Description: mszb commented on pull request #9153: [BEAM-6694] Added 
Approximate Quantile Transfrom on Python SDK
URL: https://github.com/apache/beam/pull/9153#discussion_r314675928
 
 

 ##########
 File path: sdks/python/apache_beam/transforms/stats_test.py
 ##########
 @@ -345,5 +353,273 @@ def 
test_approximate_unique_globally_by_error_with_skewed_data(self):
     pipeline.run()
 
 
+class ApproximateQuantilesTest(unittest.TestCase):
+  _kv_data = [("a", 1), ("a", 2), ("a", 3), ("b", 1), ("b", 10), ("b", 10),
+              ("b", 100)]
+  _kv_str_data = [("a", "a"), ("a", "a"*2), ("a", "a"*3), ("b", "b"),
+                  ("b", "b"*10), ("b", "b"*10), ("b", "b"*100)]
+
+  @staticmethod
+  def _quantiles_matcher(expected):
+    l = len(expected)
+
+    def assert_true(exp):
+      if not exp:
+        raise BeamAssertException('%s Failed assert True' % repr(exp))
+
+    def match(actual):
+      actual = actual[0]
+      for i in range(l):
+        if isinstance(expected[i], list):
+          assert_true(expected[i][0] <= actual[i] <= expected[i][1])
+        else:
+          equal_to([expected[i]])([actual[i]])
+
+    return match
+
+  @staticmethod
+  def _approx_quantile_generator(size, num_of_quantiles, absoluteError):
+    quantiles = [0]
+    k = 1
+    while k < num_of_quantiles - 1:
+      expected = (size - 1) * k / (num_of_quantiles - 1)
+      quantiles.append([expected - absoluteError, expected + absoluteError])
+      k = k + 1
+    quantiles.append(size - 1)
+    return quantiles
+
+  def test_quantiles_globaly(self):
+    with TestPipeline() as p:
+      pc = p | Create(range(101))
 
 Review comment:
   I've changed it to `pc = p | Create(list(range(101)))`. As `list` will be a 
bit faster as compared to list comprehension which loops through each element!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 296229)
    Time Spent: 9h 10m  (was: 9h)

> ApproximateQuantiles transform for Python SDK
> ---------------------------------------------
>
>                 Key: BEAM-6694
>                 URL: https://issues.apache.org/jira/browse/BEAM-6694
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Ahmet Altay
>            Assignee: Shehzaad Nakhoda
>            Priority: Minor
>          Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to