Cristian created SPARK-11879:
--------------------------------
Summary: Checkpoint support for DataFrame
Key: SPARK-11879
URL: https://issues.apache.org/jira/browse/SPARK-11879
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.5.2
Reporter: Cristian
Explicit support for checkpointing DataFrames is need to be able to truncate
lineages, prune the query plan (particularly the logical plan) and transparent
failure recovery.
While for recovery saving to a Parquet file may be sufficient, actually using
that as a checkpoint (and truncating the lineage), requires reading the files
back.
This is required to be able to use DataFrames in iterative scenarios like
Streaming and ML, as well as for avoiding expensive re-computations in case of
executor failure when executing a complex chain of queries on very large
datasets.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]