RE: parallel arrays and sorting

Herhut, Stephan A Thu, 18 Apr 2013 21:47:13 -0700

Thanks, this indeed is an interesting use case.

Previously, I thought that using a function to map objects to a sort key is at 
least as expressive as having two arrays, as a naïve implementation of the 
mapping function could just be to read the appropriate value from the key array 
(assuming the mapping function would know the index). However, such a solution 
does not generate a sorted key array and thus the correlation between the 
sorted objects and their keys is lost.


After reading the blog post you linked, I am still not sure whether that 
actually is an issue. It seems for a rendering pipeline only the sorted objects 
are of importance.

So, potential runtime efficiency issues aside, do you know a use case where one 
also needs the array of sorted keys?

Stephan

From: Kevin Gadd [mailto:[email protected]]
Sent: Thursday, April 11, 2013 11:43 AM
To: Norm Rubin
Cc: Herhut, Stephan A; [email protected]<mailto:[email protected]>; 
[email protected]<mailto:[email protected]>
Subject: Re: parallel arrays and sorting

One simple-ish real world example of a use case that can benefit from a 
parallel sort:
It's common in games and other realtime rendering scenarios to want to sort 
your queue of rendering operations by various attributes, in order to minimize 
the number of hardware state changes and allow you to batch up operations.

http://realtimecollisiondetection.net/blog/?p=86 is an article that describes 
an approach used to generate 64-bit 'sort keys' for drawing operations that was 
used in some current-gen console games. At present, I use a similar technique 
to sort drawing operations in my buffers, and more importantly, I do all the 
sorting and preparation operations in parallel. For complex scenes I could 
imagine it being quite common to want to leverage parallel machinery (or even 
GPU hardware) to sort the thousands of individual drawing operations in a given 
scene before clustering them up into larger hardware operations.
The proposed compromise of providing a function that maps the objects being 
sorted to 'sort keys' seems a suitable choice for this kind of scenario - the 
draw calls can have a sort key implicitly (like described in that blog post), 
or you can trivially manufacture one (that's what I do). This is definitely a 
use case where only being able to sort primitives would not be adequate. 
However, a sort primitive that operates on a pair of arrays would also suffice 
- Norm calls this 'sort by key', and the .NET BCL exposes this as an overload 
of Array.Sort. It's quite useful and it seems like it might be feasible to 
expose this, even if only the key-sorting occurs in parallel (and the value 
sorting can't because the values are ordinary JS objects).

On Thu, Apr 11, 2013 at 11:35 AM, Norm Rubin 
<[email protected]<mailto:[email protected]>> wrote:
Good set of questions about sort,  thanks for looking at this carefully.  You 
guys have  been great about responding to my emails.



One sorting application that I would expect is physics particle simulation, 
where the code  might sort objects based on distance from the viewer. This 
could be used so that  far away objects get rendered smaller.

Another might be  computing a histogram  over a visual image – sort the data 
first so nearby values will go into the same bucket, without needing atomic 
synchronization.

Sorting with an optional compare seems ok  to me- even if implementations can 
only optimize for  some  comparisons

One  interesting  question you brought up as  part of sorting by key is pretty 
tricky.

In general parallel data arrays  would like to be stored in transposed  order 
from classic cpu styles.
We usually call this array of structures or structures of arrays
Do we sort  an array of objects each with multiple fields stored  one object 
after another or do we sort multiple arrays,  with all the values of field1 
followed by all the values of field2 etc

I’d hope we allow  implementations to reorder the data within a parallelArray 
as they like without needing to expose the layout to developers.  Thrust, on 
the other hand, pushes the layout  to the developer and thus has a significant 
set of transpose routines.

Just for a point of reference there are  8  versions of sort in the thrust 
library. As  you can see, Thrust never aimed to be a minimal set it just gained 
operations as applications appeared. AMD has a similar library called bolt 
which uses the same interface.
Two different parallel  libraries with the same routines does suggest that this 
set is enough  to do useful work.


Internally thrust notices  that the compare is  <  on primitive  types and  
uses radix sort (on the gpu).

Thurst includes:
  Stable and unstable forms
Ascending only or with a comparison function
Single array or a pair of keys and values


1)      Sort(array)  - sorts the array into ascending order, not guaranteed to 
be stable

2)      Sort (array)– with a comparison function

3)      Sort by key (keys, values) returns both the reordered keys and  the 
reordered values, sorted by the array keys. This one is kind of confusing so 
here is an example

// an example of key sorting
#include <thrust/sort.h<http://docs.thrust.googlecode.com/hg/sort_8h.html>>
  const int N = 6;
  int    keys[N] = {  1,   4,   2,   8,   5,   7};
  char values[N] = {'a', 'b', 'c', 'd', 'e', 'f'};
  
thrust::sort_by_key<http://docs.thrust.googlecode.com/hg/group__sorting.html#ga2bb765aeef19f6a04ca8b8ba11efff24>(keys,
 keys + N, values, 
thrust::greater<int><http://docs.thrust.googlecode.com/hg/structthrust_1_1greater.html>());
  // keys is now   {  8,   7,   5,   4,   2,   1}
  // values is now {'d', 'f', 'e', 'b', 'c', 'a'}




4)      Sort by key with a  comparison function

5-8 the stable forms


Based on the thrust primitives, one  parallel array sort  might be

Sort an array of objects
Besides the array the arguments might be
- a flag indicating if the sort must be  stable
- an optional user compare function that defaulted to < on primitive types


From: Herhut, Stephan A 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Thursday, April 11, 2013 1:09 PM
To: Norm Rubin; [email protected]<mailto:[email protected]>
Subject: RE: parallel arrays and sorting

Rick is travelling, so let me chime in.

We have discussed this back and forth but have not come to a conclusion. 
Generally, we agree that adding a sort primitive makes a lot of sense, in 
particular as the Array object in JavaScript already has a sort method. Also, 
as you too mentioned, implementing an efficient sort as a library function 
without knowing the details of the parallel hardware used is difficult, to say 
the least. So sort ticks all the boxes to become a primitive.

The other, and arguably more difficult question, is what a sort method should 
look like. If we take JavaScript’s existing Array.sort, the sort method would 
get an (optional) comparator function. However, using a comparator would 
preclude the use of radix sort.
An alternative would be to implement sorting of primitive types only. This 
brings back more choice in sort algorithms but limits use. For such a design, 
we considered a function as optional argument to sort that, given a value from 
the ParallelArray to be sorted, returns a key used for comparison, which again 
needs to be a primitive. This would at least enable sorting of objects by a 
field and, at some runtime cost, sorting of general data.

The tradeoff between these two approaches, and probably other designs, is hard 
to judge without knowing what sort is used for. So we decided to wait for some 
good use cases before deciding on a specific design.

Sorry, no answers only further questions.
  Stephan

From: [email protected]<mailto:[email protected]> 
[mailto:[email protected]] On Behalf Of Norm Rubin
Sent: Monday, April 08, 2013 7:46 AM
To: [email protected]<mailto:[email protected]>
Subject: parallel arrays and sorting

In comparing ParallelArrays (rivertrail) to the cuda thrust library,  I noticed 
 that Sorting using parallelArrays looks like a missing primitive  operation.

Sorting is  special because the good (high performance) algorithm on a gpu is 
radix sort, while the good (high performance) algorithm on the cpu is parallel 
merge sort,  The other way around radix sort of cpu, or parallel merge sort of 
a gpu is very slow and often worse than serial implementations

Sadly it is well past a jit to take the code for one flavor of sort and 
transform it into the other,  while  it would be pretty simple for a run-time 
to pick a good sort, if only it knew that a sort was going on.  Run times would 
not be required to do  so  here since they can always pick a slow sort.

I know this is a slippery road, since once you add another prim, adding more 
prims becomes ever easier  but sorting seems pretty important.   And array 
already has a sort, so adding a version that works on ParallelArrays does not 
seem so bad.

What do you guys think?


________________________________
This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.
________________________________

_______________________________________________
es-discuss mailing list
[email protected]<mailto:[email protected]>
https://mail.mozilla.org/listinfo/es-discuss



--
-kg

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

RE: parallel arrays and sorting

Reply via email to