In comparing ParallelArrays (rivertrail) to the cuda thrust library,  I noticed 
 that Sorting using parallelArrays looks like a missing primitive  operation.

Sorting is  special because the good (high performance) algorithm on a gpu is 
radix sort, while the good (high performance) algorithm on the cpu is parallel 
merge sort,  The other way around radix sort of cpu, or parallel merge sort of 
a gpu is very slow and often worse than serial implementations

Sadly it is well past a jit to take the code for one flavor of sort and 
transform it into the other,  while  it would be pretty simple for a run-time 
to pick a good sort, if only it knew that a sort was going on.  Run times would 
not be required to do  so  here since they can always pick a slow sort.

I know this is a slippery road, since once you add another prim, adding more 
prims becomes ever easier  but sorting seems pretty important.   And array 
already has a sort, so adding a version that works on ParallelArrays does not 
seem so bad.

What do you guys think?



-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to