Abacn commented on code in PR #31969:
URL: https://github.com/apache/beam/pull/31969#discussion_r1695682193


##########
sdks/python/apache_beam/metrics/cells.py:
##########
@@ -553,3 +610,22 @@ def combine(self, x, y):
   def result(self, x):
     # type: (GaugeData) -> GaugeResult
     return GaugeResult(x.get_cumulative())
+
+
+class StringSetAggregator(MetricAggregator):
+  @staticmethod
+  def identity_element():
+    # type: () -> set
+    return set()
+
+  def combine(self, x, y):
+    # type: (set, set) -> set
+    if len(x) == 0:
+      return y
+    elif len(y) == 0:
+      return x
+    else:
+      return set.union(x, y)

Review Comment:
   this mirrors #31803 . I wonder in most cases the aggregation encounters one 
empty and one non-empty cells, so a short cut optimization is reasonable here.
   
   > Could there be any concerns about side effects due to Aggregation 
returning a mutable collection
   
   This is a good point. I was hesitating if I should return set / frozenset . 
However in Python frozenset isn't instance of set (`isinstance(frozenset(), 
set)` is False) while Java ImmutableSet is Set, I ended up with set. Do you 
think we should make the result of StringSet query a frozenset instance 
everywhere?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to