Hi all, I have been using sort by and order by in spark sql and I observed the following
when using SORT BY and collect results, the results are getting sorted partition by partition. example: if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in descending order, partition 0 (p0) would have 12, 8, 4 p1 = 11, 7, 3 p2 = 10, 6, 2 p3 = 9, 5, 1 so collect() would return 12, 8, 4, 11, 7, 3, 10, 6, 2, 9, 5, 1 BUT when I use ORDER BY and collect results p0 = 12, 11, 10 p1 = 9, 8, 7 ..... so collect() would return 12, 11, .., 1 which is the desirable result. is this the intended behavior of SORT BY and ORDER BY or is there something I'm missing? cheers -- Niranda @n1r44 <https://twitter.com/N1R44> https://pythagoreanscript.wordpress.com/