In comparing ParallelArrays (rivertrail) to the cuda thrust library, I noticed that Sorting using parallelArrays looks like a missing primitive operation.
Sorting is special because the good (high performance) algorithm on a gpu is radix sort, while the good (high performance) algorithm on the cpu is parallel merge sort, The other way around radix sort of cpu, or parallel merge sort of a gpu is very slow and often worse than serial implementations Sadly it is well past a jit to take the code for one flavor of sort and transform it into the other, while it would be pretty simple for a run-time to pick a good sort, if only it knew that a sort was going on. Run times would not be required to do so here since they can always pick a slow sort. I know this is a slippery road, since once you add another prim, adding more prims becomes ever easier but sorting seems pretty important. And array already has a sort, so adding a version that works on ParallelArrays does not seem so bad. What do you guys think? ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -----------------------------------------------------------------------------------
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

