[jira] [Resolved] (SPARK-12620) Proposal of GPU exploitation for Spark

Sean Owen (JIRA) Mon, 04 Jan 2016 06:19:04 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-12620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-12620.
-------------------------------
    Resolution: Duplicate

[~kiszk] well, now you've just opened a duplicate. That's not helpful, since it 
just splinters the conversation.

Keeping one open would be OK if there were an actionable change here, but 
you're just describing external, experimental work, which does not sound like 
something that needs to be in Spark, not now.

> Proposal of GPU exploitation for Spark
> --------------------------------------
>
>                 Key: SPARK-12620
>                 URL: https://issues.apache.org/jira/browse/SPARK-12620
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Kazuaki Ishizaki
>
> I created a new JIRA entry to move from SPARK-3875
> Exploiting GPUs can allow us to shorten the execution time of a Spark job and 
> to reduce the number of machines in a cluster. We are working to effectively 
> and easily exploit GPUs on Spark at  [http://github.com/kiszk/spark-gpu]. Our 
> project page is [http://kiszk.github.io/spark-gpu/]. A design document is 
> [here|https://docs.google.com/document/d/1bo1hbQ7ikdUA9LYtYh6kU_TwjFK2ebkHsH66QlmbYP8/edit?usp=sharing]
> Our ideas for exploiting GPUs are
> # adding a new format for a partition in an RDD, which is a column-based 
> structure in an array format, in addition to the current Iterator\[T\] format 
> with Seq\[T\]
> # generating parallelized GPU native code to access data in the new format 
> from a Spark application program by using an optimizer and code generator 
> (this is similar to [Project 
> Tungsten|https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html])
>  and pre-compiled library
> The motivation of idea 1 is to reduce the overhead of 
> serializing/deserializing partition data for copy between CPU and GPU. The 
> motivation of idea 2 is to avoid writing hardware-dependent code by 
> application programmers. At first, we are working for idea A (For idea B, we 
> need to write [CUDA|https://en.wikipedia.org/wiki/CUDA] code for now). 
> This prototype achieved [3.15x performance 
> improvement|https://github.com/kiszk/spark-gpu/wiki/Benchmark] of logistic 
> regression 
> ([SparkGPULR|https://github.com/kiszk/spark-gpu/blob/dev/examples/src/main/scala/org/apache/spark/examples/SparkGPULR.scala])
>  in examples on a 16-thread IvyBridge box with an NVIDIA K40 GPU card over 
> that with no GPU card
> You can download the pre-build binary for x86_64 and ppc64le from 
> [here|https://github.com/kiszk/spark-gpu/wiki/Downloads]. You can run this on 
> Amazon EC2 by [the 
> procedure|https://github.com/kiszk/spark-gpu/wiki/How-to-run-%28local-or-AWS-EC2%29],
>  too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-12620) Proposal of GPU exploitation for Spark

Reply via email to