Abacn commented on code in PR #31969: URL: https://github.com/apache/beam/pull/31969#discussion_r1695682193
########## sdks/python/apache_beam/metrics/cells.py: ########## @@ -553,3 +610,22 @@ def combine(self, x, y): def result(self, x): # type: (GaugeData) -> GaugeResult return GaugeResult(x.get_cumulative()) + + +class StringSetAggregator(MetricAggregator): + @staticmethod + def identity_element(): + # type: () -> set + return set() + + def combine(self, x, y): + # type: (set, set) -> set + if len(x) == 0: + return y + elif len(y) == 0: + return x + else: + return set.union(x, y) Review Comment: this mirrors #31803 . I wonder in most cases the aggregation encounters one empty and one non-empty cells, so a short cut optimization is reasonable here. > Could there be any concerns about side effects due to Aggregation returning a mutable collection This is a good point. I was hesitating if I should return set / frozenset . However in Python frozenset isn't instance of set (`isinstance(frozenset(), set)` is False) while Java ImmutableSet is Set, I ended up with set. Do you think we should make the result of StringSet query a frozenset instance everywhere? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org