[ 
https://issues.apache.org/jira/browse/BEAM-6694?focusedWorklogId=283566&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-283566
 ]

ASF GitHub Bot logged work on BEAM-6694:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 26/Jul/19 19:33
            Start Date: 26/Jul/19 19:33
    Worklog Time Spent: 10m 
      Work Description: Hannah-Jiang commented on pull request #9153: 
[BEAM-6694] Added Approximate Quantile Transfrom on Python SDK
URL: https://github.com/apache/beam/pull/9153#discussion_r307882164
 
 

 ##########
 File path: sdks/python/apache_beam/transforms/combiners.py
 ##########
 @@ -924,3 +928,405 @@ def merge_accumulators(self, accumulators):
 
   def extract_output(self, accumulator):
     return accumulator[0]
+
+
+class ApproximateQuantiles(object):
+  """
+  PTransfrom for getting the idea of data distribution using approximate N-tile
+  (e.g. quartiles, percentiles etc.) either globally or per-key.
+  """
+
+  @with_input_types(T)
+  @with_output_types(List[T])
+  class Globally(ptransform.PTransform):
+    """
+    PTransform takes PCollection and returns a list whose single value is
+    approximate N-tiles of the input collection globally.
+
+    Args:
+      num_quantiles: number of elements in the resulting quantiles values list.
+      compare: (optional) Comparator function which is an implementation
+        of "a < b" taking at least two arguments (a and b). Which is later
+        converted to key function as Python 3 does not support cmp.
+      key: (optional) Key is  a mapping of elements to a comparable key, 
similar
+        to the key argument of Python's sorting methods.
+    """
+
+    def __init__(self, num_quantiles, compare=None, key=None, reverse=False):
+      self.num_quantiles = num_quantiles
+      self.compare = compare
+      self.key = key
+      self.reverse = reverse
+
+    def expand(self, pcoll):
+      return pcoll | core.CombineGlobally(ApproximateQuantilesCombineFn.create(
+          num_quantiles=self.num_quantiles, compare=self.compare,
+          key=self.key, reverse=self.reverse
+      ))
+
+    def display_data(self):
+      return {
+          'num_quantiles': DisplayDataItem(self.num_quantiles,
+                                           label="Quantile Count"),
+          'compare': DisplayDataItem(self.compare.__class__,
+                                     label='Record Comparer FN'),
+          'key': DisplayDataItem(self.key.__class__,
+                                 label='Record Comparer Key'),
+          'reverse': DisplayDataItem(self.reverse.__class__,
+                                     label='Is reversed'),
+      }
+
+  @with_input_types(Tuple[K, V])
+  @with_output_types(List[Tuple[K, V]])
+  class PerKey(ptransform.PTransform):
+    """
+    PTransform takes PCollection of KV and returns a list based on each key
+    whose single value is list of approximate N-tiles of the input element of
+    the key.
+
+    Args:
+      num_quantiles: number of elements in the resulting quantiles values list.
+      compare: (optional) Comparator function which is an implementation
+        of "a < b" taking at least two arguments (a and b). Which is later
+        converted to key function as Python 3 does not support cmp.
+      key: (optional) Key is  a mapping of elements to a comparable key, 
similar
+        to the key argument of Python's sorting methods.
+    """
+
+    def __init__(self, num_quantiles, compare=None, key=None, reverse=False):
+      self.num_quantiles = num_quantiles
+      self.compare = compare
+      self.key = key
+      self.reverse = reverse
+
+    def expand(self, pcoll):
+      return pcoll | core.CombinePerKey(ApproximateQuantilesCombineFn.create(
+          num_quantiles=self.num_quantiles, compare=self.compare,
+          key=self.key, reverse=self.reverse
+      ))
+
+    def display_data(self):
+      return {
 
 Review comment:
   We return the same format for Globally and PerKey, can we share the format?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 283566)
    Time Spent: 1h 50m  (was: 1h 40m)

> ApproximateQuantiles transform for Python SDK
> ---------------------------------------------
>
>                 Key: BEAM-6694
>                 URL: https://issues.apache.org/jira/browse/BEAM-6694
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-py-core
>            Reporter: Ahmet Altay
>            Assignee: Shehzaad Nakhoda
>            Priority: Minor
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Add PTransforms for getting an idea of a PCollection's data distribution 
> using approximate N-tiles (e.g. quartiles, percentiles, etc.), either 
> globally or per-key.
> It should offer the same API as its Java counterpart: 
> https://github.com/apache/beam/blob/11a977b8b26eff2274d706541127c19dc93131a2/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/ApproximateQuantiles.java



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to