[ 
https://issues.apache.org/jira/browse/BEAM-6694?focusedWorklogId=295566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295566
 ]

ASF GitHub Bot logged work on BEAM-6694:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Aug/19 17:00
            Start Date: 15/Aug/19 17:00
    Worklog Time Spent: 10m 
      Work Description: aaltay commented on pull request #9153: [BEAM-6694] 
Added Approximate Quantile Transfrom on Python SDK
URL: https://github.com/apache/beam/pull/9153#discussion_r314404234
 
 

 ##########
 File path: sdks/python/apache_beam/transforms/stats_test.py
 ##########
 @@ -345,5 +353,273 @@ def 
test_approximate_unique_globally_by_error_with_skewed_data(self):
     pipeline.run()
 
 
+class ApproximateQuantilesTest(unittest.TestCase):
+  _kv_data = [("a", 1), ("a", 2), ("a", 3), ("b", 1), ("b", 10), ("b", 10),
+              ("b", 100)]
+  _kv_str_data = [("a", "a"), ("a", "a"*2), ("a", "a"*3), ("b", "b"),
+                  ("b", "b"*10), ("b", "b"*10), ("b", "b"*100)]
+
+  @staticmethod
+  def _quantiles_matcher(expected):
+    l = len(expected)
+
+    def assert_true(exp):
+      if not exp:
+        raise BeamAssertException('%s Failed assert True' % repr(exp))
+
+    def match(actual):
+      actual = actual[0]
+      for i in range(l):
+        if isinstance(expected[i], list):
+          assert_true(expected[i][0] <= actual[i] <= expected[i][1])
+        else:
+          equal_to([expected[i]])([actual[i]])
+
+    return match
+
+  @staticmethod
+  def _approx_quantile_generator(size, num_of_quantiles, absoluteError):
+    quantiles = [0]
+    k = 1
+    while k < num_of_quantiles - 1:
+      expected = (size - 1) * k / (num_of_quantiles - 1)
+      quantiles.append([expected - absoluteError, expected + absoluteError])
+      k = k + 1
+    quantiles.append(size - 1)
+    return quantiles
+
+  def test_quantiles_globaly(self):
+    with TestPipeline() as p:
+      pc = p | Create(range(101))
 
 Review comment:
   Let's change this to an actual list instead of a range object. The reason,  
if in the future we convert this to a validates runner test, we would need to 
do the same `from builtins import range` in the workers.
   
   Suggestion:
   `pc = p | Create([x for x in range(101)])`
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 295566)
    Time Spent: 8h 40m  (was: 8.5h)

> ApproximateQuantiles transform for Python SDK
> ---------------------------------------------
>
>                 Key: BEAM-6694
>                 URL: https://issues.apache.org/jira/browse/BEAM-6694
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Ahmet Altay
>            Assignee: Shehzaad Nakhoda
>            Priority: Minor
>          Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to