Hey all,

I was poking around and looking at `Distinct` and was confused about why it
was implemented the way it was.

Reproduced here:
@ptransform_fn
@typehints.with_input_types(T)
@typehints.with_output_types(T)
def Distinct(pcoll):  # pylint: disable=invalid-name
  """Produces a PCollection containing distinct elements of a
PCollection."""
  return (
      pcoll
      | 'ToPairs' >> Map(lambda v: (v, None))
      | 'Group' >> CombinePerKey(lambda vs: None)
      | 'Distinct' >> Keys())

Could anyone clarify why we'd use a `CombinePerKey` instead of just using
`GroupByKey`?

Cheers,
Joey

Reply via email to