In order to get a major speedup from applying *single-pass* map/filter/reduce operations on an array in GPU memory, wouldn't you need to stream the columnar data directly into GPU memory somehow? You might find in your experiments that GPU memory allocation is a bottleneck. See e.g. John Canny's paper here (Section 1.1 paragraph 2): http://www.cs.berkeley.edu/~jfc/papers/13/BIDMach.pdf If the per-item operation is very non-trivial, though, a dramatic GPU speedup may be more likely.
Something related (and perhaps easier to contribute to Spark) might be a GPU-accelerated sorter for sorting Unsafe records. Especially since that stuff is already broken out somewhat well-- e.g. `UnsafeInMemorySorter`. Spark appears to use (single-threaded) Timsort for sorting Unsafe records, so I imagine a multi-thread/multi-core GPU solution could handily beat that. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14030.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org