This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new dfff8d8 [SPARK-38353][PYTHON] Instrument __enter__ and __exit__ magic
methods for Pandas API on Spark
dfff8d8 is described below
commit dfff8d8cd0e747a261bf74374a5797e7c2acaebb
Author: Yihong He <[email protected]>
AuthorDate: Thu Mar 3 20:51:08 2022 +0900
[SPARK-38353][PYTHON] Instrument __enter__ and __exit__ magic methods for
Pandas API on Spark
### What changes were proposed in this pull request?
- Add magic method \_\_enter\_\_ and \_\_exit\_\_ into **the
special_function list**
### Why are the changes needed?
- Improve the usage data accuracy for **with statement** so that external
\_\_enter\_\_ and \_\_exit\_\_ calls are captured instead of internal calls
For example, for the code below:
```python
pdf = pd.DataFrame(
[(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)], columns=["dogs",
"cats"]
)
psdf = ps.from_pandas(pdf)
with psdf.spark.cache() as cached_df:
self.assert_eq(isinstance(cached_df, CachedDataFrame), True)
self.assert_eq(
repr(cached_df.spark.storage_level), repr(StorageLevel(True, True,
False, True))
)
```
Pandas-on-Spark usage logger records the internal call
[self.spark.unpersist()](https://github.com/apache/spark/blob/master/python/pyspark/pandas/frame.py#L12518)
since \_\_enter\_\_ and \_\_exit\_\_ methods of
[CachedDataFrame](https://github.com/apache/spark/blob/master/python/pyspark/pandas/frame.py#L12492)
are not instrumented.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing unit tests
Closes #35687 from heyihong/SPARK-38353.
Authored-by: Yihong He <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/usage_logging/__init__.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/python/pyspark/pandas/usage_logging/__init__.py
b/python/pyspark/pandas/usage_logging/__init__.py
index b350faf..10fe616 100644
--- a/python/pyspark/pandas/usage_logging/__init__.py
+++ b/python/pyspark/pandas/usage_logging/__init__.py
@@ -135,6 +135,8 @@ def attach(logger_module: Union[str, ModuleType]) -> None:
"__getitem__",
"__setitem__",
"__getattr__",
+ "__enter__",
+ "__exit__",
]
)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]