Okie doke, I've filed a JIRA for this here: https://issues.apache.org/jira/browse/SPARK-16921
On Fri, Aug 5, 2016 at 2:08 AM Reynold Xin <r...@databricks.com> wrote: > Sounds like a great idea! > > On Friday, August 5, 2016, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > >> Context managers >> <https://docs.python.org/3/reference/datamodel.html#context-managers> >> are a natural way to capture closely related setup and teardown code in >> Python. >> >> For example, they are commonly used when doing file I/O: >> >> with open('/path/to/file') as f: >> contents = f.read() >> ... >> >> Once the program exits the with block, f is automatically closed. >> >> Does it make sense to apply this pattern to persisting and unpersisting >> DataFrames and RDDs? I feel like there are many cases when you want to >> persist a DataFrame for a specific set of operations and then unpersist it >> immediately afterwards. >> >> For example, take model training. Today, you might do something like this: >> >> labeled_data.persist() >> model = pipeline.fit(labeled_data) >> labeled_data.unpersist() >> >> If persist() returned a context manager, you could rewrite this as >> follows: >> >> with labeled_data.persist(): >> model = pipeline.fit(labeled_data) >> >> Upon exiting the with block, labeled_data would automatically be >> unpersisted. >> >> This can be done in a backwards-compatible way since persist() would >> still return the parent DataFrame or RDD as it does today, but add two >> methods to the object: __enter__() and __exit__() >> >> Does this make sense? Is it attractive? >> >> Nick >> >> >