[jira] [Created] (SPARK-11879) Checkpoint support for DataFrame

Cristian (JIRA) Fri, 20 Nov 2015 04:33:42 -0800

Cristian created SPARK-11879:
--------------------------------

             Summary: Checkpoint support for DataFrame
                 Key: SPARK-11879
                 URL: https://issues.apache.org/jira/browse/SPARK-11879
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.5.2
            Reporter: Cristian



Explicit support for checkpointing DataFrames is need to be able to truncate 
lineages, prune the query plan (particularly the logical plan) and transparent 
failure recovery.

While for recovery saving to a Parquet file may be sufficient, actually using 
that as a checkpoint (and truncating the lineage), requires reading the files 
back.

This is required to be able to use DataFrames in iterative scenarios like 
Streaming and ML, as well as for avoiding expensive re-computations in case of 
executor failure when executing a complex chain of queries on very large 
datasets. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-11879) Checkpoint support for DataFrame

Reply via email to