Marek Simunek created BEAM-6053:
-----------------------------------
Summary: Add option to disable caching in Spark
Key: BEAM-6053
URL: https://issues.apache.org/jira/browse/BEAM-6053
Project: Beam
Issue Type: Improvement
Components: runner-spark
Affects Versions: 2.9.0
Reporter: Marek Simunek
Assignee: Amit Sela
Add possibility to SparkOptions to turn off spark RDD caching. There are use
cases when its faster to recompute whole RDD rather then serialize, store,
deserialize, read from store.
We probably don't want to have some list of `PCollections` which we don't want
to cache, because that would be tailored to specific runner and would be
against Beam's concepts. So I propose to turn off caching for the whole
pipeline.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)