lukecwik commented on a change in pull request #11060: [BEAM-9454] Create
Deduplication transform based on user timer/state
URL: https://github.com/apache/beam/pull/11060#discussion_r391728602
##########
File path: sdks/python/apache_beam/runners/sdf_utils.py
##########
@@ -244,3 +251,63 @@ def get_estimator_state(self):
return None
return _NoOpWatermarkEstimator()
+
+
+class DeduplictaionWithinDuration(ptransform.PTransform):
Review comment:
It would be useful to expose a keyed deduplication transform as the common
implementation that all use internally so in the future we can turn into a well
known URN and then runners could provide optimized deduplication transform
implementations.
We want pipeline authors to use this transform and I think it should go into
sdks/python/apache_beam/transforms/util.py or into a dedicated file such as
sdks/python/apache_beam/transforms/ such as deduplicate.py.
CC: @udim What do you think?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services