Sounds like a great idea! On Friday, August 5, 2016, Nicholas Chammas <nicholas.cham...@gmail.com> wrote:
> Context managers > <https://docs.python.org/3/reference/datamodel.html#context-managers> are > a natural way to capture closely related setup and teardown code in Python. > > For example, they are commonly used when doing file I/O: > > with open('/path/to/file') as f: > contents = f.read() > ... > > Once the program exits the with block, f is automatically closed. > > Does it make sense to apply this pattern to persisting and unpersisting > DataFrames and RDDs? I feel like there are many cases when you want to > persist a DataFrame for a specific set of operations and then unpersist it > immediately afterwards. > > For example, take model training. Today, you might do something like this: > > labeled_data.persist() > model = pipeline.fit(labeled_data) > labeled_data.unpersist() > > If persist() returned a context manager, you could rewrite this as > follows: > > with labeled_data.persist(): > model = pipeline.fit(labeled_data) > > Upon exiting the with block, labeled_data would automatically be > unpersisted. > > This can be done in a backwards-compatible way since persist() would > still return the parent DataFrame or RDD as it does today, but add two > methods to the object: __enter__() and __exit__() > > Does this make sense? Is it attractive? > > Nick > >