[jira] [Updated] (SPARK-13004) Support Non-Volatile Data and Operations

Sean Owen (JIRA) Tue, 26 Jan 2016 12:46:52 -0800

     [ 
https://issues.apache.org/jira/browse/SPARK-13004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen updated SPARK-13004:
------------------------------
    Target Version/s:   (was: 1.6.0)

OK, sounds interesting, but sounds like you are already developing this 
separately. I don't see why this is a Spark JIRA (yet)? what specific change 
does this propose?

> Support Non-Volatile Data and Operations
> ----------------------------------------
>
>                 Key: SPARK-13004
>                 URL: https://issues.apache.org/jira/browse/SPARK-13004
>             Project: Spark
>          Issue Type: Epic
>          Components: Input/Output, Spark Core
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Wang, Gang
>              Labels: Non-VolatileRDD, Non-volatileComputing, RDD, performance
>
> Based on our experiments, the SerDe-like operations have some significant 
> negative performance impacts on majority of industrial Spark workloads, 
> especially, when the volumn of datasets are much larger than the system 
> memory volumns of Spark cluster available to caching, checkpoint, 
> shuffling/dispatching, data loading and Storing. the JVM on-heap management 
> would downgrade the performance as well when under pressure incurred by large 
> memory demand and frequently memory allocation/free operations.
> With the trend of adopting advanced server platform technologies e.g. Large 
> Memory Server, Non-volatile Memory and NVMe/Fast SSD Array Storage, This 
> project focuses on adopting new features provided by server platform for 
> Spark applications and retrofitting the utilization of hybrid addressable 
> memory resources onto Spark whenever possible.
> *Data Object Managment*
>   * Using our non-volatile generic object programming model (NVGOP) to avoid 
> SerDe as well as reduce GC overhead.
>   * Minimizing memory footprint to load data lazily.
>   * Being naturally fit for RDD schemas in non-volatile RDD and off-heap RDD.
>   * Using non-volatile/off-heap RDDs to transform Spark datasets.
>   * Avoiding the memory caching part by the way of in-place non-volatile RDD 
> operations.
>   * Avoiding the checkpoints for Spark computing.
> *Data Memory Management*
>   
>   * Managing hereogeneous memory devices as an unified hybrid memory cache 
> pool for Spark.
>   * Using non-volatile memory-like devices for Spark checkpoint and shuffle.
>   * Supporting to Reclaim allocated memory blocks automatically.
>   * Providing an unified memory block APIs for the general purpose of memory 
> usage.
>   
> *Computing device management*
>   * AVX instructions, programmable FPGA and GPU.
>   
> Our customized Spark prototype has shown some potential improvements.
> [https://github.com/NonVolatileComputing/spark/tree/NonVolatileRDD]
> !http://bigdata-memory.github.io/images/Spark_mlib_kmeans.png|width=300!
> !http://bigdata-memory.github.io/images/total_GC_STW_pausetime.png|width=300!
>   
> This epic tries to further improve the Spark performance with our 
> non-volatile solutions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-13004) Support Non-Volatile Data and Operations

Reply via email to