Abacn commented on code in PR #31969:
URL: https://github.com/apache/beam/pull/31969#discussion_r1695682193
##########
sdks/python/apache_beam/metrics/cells.py:
##########
@@ -553,3 +610,22 @@ def combine(self, x, y):
def result(self, x):
# type: (GaugeData) -> GaugeResult
return GaugeResult(x.get_cumulative())
+
+
+class StringSetAggregator(MetricAggregator):
+ @staticmethod
+ def identity_element():
+ # type: () -> set
+ return set()
+
+ def combine(self, x, y):
+ # type: (set, set) -> set
+ if len(x) == 0:
+ return y
+ elif len(y) == 0:
+ return x
+ else:
+ return set.union(x, y)
Review Comment:
this mirrors #31803 . I wonder in most cases the aggregation encounters one
empty and one non-empty cells, so a short cut optimization is reasonable here.
> Could there be any concerns about side effects due to Aggregation
returning a mutable collection
This is a good point. I was hesitating if I should return set / frozenset .
However in Python frozenset isn't instance of set (`isinstance(frozenset(),
set)` is False) while Java ImmutableSet is Set, I ended up with set. Do you
think we should make the result of StringSet query a frozenset instance
everywhere?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]