Hi, Yes this is intended behavior. "ORDER BY" guarantees the total order in output while "SORT BY" guarantees the order within a partition.
Vishnu On Thu, Sep 3, 2015 at 10:49 AM, Niranda Perera <niranda.per...@gmail.com> wrote: > Hi all, > > I have been using sort by and order by in spark sql and I observed the > following > > when using SORT BY and collect results, the results are getting sorted > partition by partition. > example: > if we have 1, 2, ... , 12 and 4 partitions and I want to sort it in > descending order, > partition 0 (p0) would have 12, 8, 4 > p1 = 11, 7, 3 > p2 = 10, 6, 2 > p3 = 9, 5, 1 > > so collect() would return 12, 8, 4, 11, 7, 3, 10, 6, 2, 9, 5, 1 > > BUT when I use ORDER BY and collect results > p0 = 12, 11, 10 > p1 = 9, 8, 7 > ..... > so collect() would return 12, 11, .., 1 which is the desirable result. > > is this the intended behavior of SORT BY and ORDER BY or is there > something I'm missing? > > cheers > > -- > Niranda > @n1r44 <https://twitter.com/N1R44> > https://pythagoreanscript.wordpress.com/ >