Each runner can choose to override the SortValues PTransform with their own internal offering. For example Spark overrides global combine[1] during pipeline translation. If Spark detected the SortValues PTransform during translation, it could override the offering with something that used repartitionAndSortWithinPartitions.
GroupByKeyAndSortValuesOnly inside Dataflow exists to support a specific use case. Users should rely on SortValues as it is the public implementation for sorting. 1: https://github.com/apache/beam/blob/85dcab56268fbac923ffd5885489ee154f097fc5/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/TransformTranslator.java#L200 As a side note, its uncommon where you need to sort all values, usually top 100 suffices and can be implemented much more efficiently with a combiner when compared to sorting. On Wed, May 30, 2018 at 3:38 AM <marek-simu...@seznam.cz> wrote: > Hi, > I have question I am trying to do translation in dsl-euphoria for > “GroupByKey with sorted values within key” to Beam. I am aware of java sdk > extensions SortValues, but it doesn’t have sufficient abstraction for > runners. > > I noticed that in DataflowRunner there is translation of batch GroupByKey > to GroupByKeyAndSortValuesOnly but is it considered to have it in beam core > so for example SparkRunner could translate “GroupByKey with sorted values > within key” with their internals such as repartitionAndSortWithinPartitions. > > Thank you. > Marek Simunek >