I am already looking at the dataframes APIs and the implementation. In fact, the columnar representation https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala is what gave me the idea of my talk proposal. It is ideally suited for computation on GPU. But from what Reynold said, it appears that the columnar structure is not exploited for computation like expressions. It appears that the columnar structure is used only for space efficient in memory storage and not for computations. Even the TungstenProject invokes the operations on a row by row basis. The UnsafeRow is optimized in the sense that it is only a logical row as opposed to the InternalRow which has physical copies of the values. But the computation is still on a per row basis rather than batches of rows stored in columnar structure.
Thanks for some concrete suggestions on presentation. I do have the core idea or theme of my talk ready in mind, but I will now present on the lines you suggest. I wasn't really thinking of a demo, but now I will do that. I was actually hoping to be able to contribute to spark code and show results on those changes rather than offline changes. I will still try to do that by hooking to the columnar structure, but it may not be in a shape that can go in the spark code. Thats what I meant by severely limiting the scope of my talk. I have seen a perf improvement of 5-10 times on expression evaluation even on "ordinary" laptop GPUs. Thus, it will be a good demo along with some concrete proposals for vectorization. As you said, I will have to hook up to a column structure and perform computation and let the existing spark computation also proceed and compare the performance. I will focus on the slides early (7th Oct is deadline), and then continue the work for another 3 weeks till the summit. It still gives me enough time to do considerable work. Hope your fear does not come true. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14025.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org