I am already looking at the dataframes APIs and the implementation. In fact,
the columnar representation
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala
is what gave me the idea of my talk proposal. It is ideally suited for
computation on GPU. But from what Reynold said, it appears that the columnar
structure is not exploited for computation like expressions. It appears that
the columnar structure is used only for space efficient in memory storage
and not for computations. Even the TungstenProject invokes the operations on
a row by row basis. The UnsafeRow is optimized in the sense that it is only
a logical row as opposed to the InternalRow which has physical copies of the
values. But the computation is still on a per row basis rather than batches
of rows stored in columnar structure.

Thanks for some concrete suggestions on presentation. I do have the core
idea or theme of my talk ready in mind, but I will now present on the lines
you suggest. I wasn't really thinking of a demo, but now I will do that. I
was actually hoping to be able to contribute to spark code and show results
on those changes rather than offline changes. I will still try to do that by
hooking to the columnar structure, but it may not be in a shape that can go
in the spark code. Thats what I meant by severely limiting the scope of my
talk.

I have seen a perf improvement of 5-10 times on expression evaluation even
on "ordinary" laptop GPUs. Thus, it will be a good demo along with some
concrete proposals for vectorization. As you said, I will have to hook up to
a column structure and perform computation and let the existing spark
computation also proceed and compare the performance.

I will focus on the slides early (7th Oct is deadline), and then continue
the work for another 3 weeks till the summit. It still gives me enough time
to do considerable work. Hope your fear does not come true.






--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Code-generation-for-GPU-tp13954p14025.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to