In order to get a major speedup from applying *single-pass* map/filter/reduce
operations on an array in GPU memory, wouldn't you need to stream the
columnar data directly into GPU memory somehow?  You might find in your
experiments that GPU memory allocation is a bottleneck.  See e.g. John
Canny's paper here (Section 1.1 paragraph 2):
http://www.cs.berkeley.edu/~jfc/papers/13/BIDMach.pdf    If the per-item
operation is very non-trivial, though, a dramatic GPU speedup may be more
likely.

Something related (and perhaps easier to contribute to Spark) might be a
GPU-accelerated sorter for sorting Unsafe records.  Especially since that
stuff is already broken out somewhat well-- e.g. `UnsafeInMemorySorter`. 
Spark appears to use (single-threaded) Timsort for sorting Unsafe records,
so I imagine a multi-thread/multi-core GPU solution could handily beat that.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14030.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to