This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new dfff8d8  [SPARK-38353][PYTHON] Instrument __enter__ and __exit__ magic 
methods for Pandas API on Spark
dfff8d8 is described below

commit dfff8d8cd0e747a261bf74374a5797e7c2acaebb
Author: Yihong He <[email protected]>
AuthorDate: Thu Mar 3 20:51:08 2022 +0900

    [SPARK-38353][PYTHON] Instrument __enter__ and __exit__ magic methods for 
Pandas API on Spark
    
    ### What changes were proposed in this pull request?
    
    - Add magic method \_\_enter\_\_ and \_\_exit\_\_ into **the 
special_function list**
    
    ### Why are the changes needed?
    
    - Improve the usage data accuracy for **with statement** so that external 
\_\_enter\_\_ and \_\_exit\_\_ calls are captured instead of internal calls
    
    For example, for the code below:
    
    ```python
    pdf = pd.DataFrame(
        [(0.2, 0.3), (0.0, 0.6), (0.6, 0.0), (0.2, 0.1)], columns=["dogs", 
"cats"]
    )
    psdf = ps.from_pandas(pdf)
    
    with psdf.spark.cache() as cached_df:
        self.assert_eq(isinstance(cached_df, CachedDataFrame), True)
        self.assert_eq(
            repr(cached_df.spark.storage_level), repr(StorageLevel(True, True, 
False, True))
        )
     ```
    
    Pandas-on-Spark usage logger records the internal call 
[self.spark.unpersist()](https://github.com/apache/spark/blob/master/python/pyspark/pandas/frame.py#L12518)
 since \_\_enter\_\_ and \_\_exit\_\_ methods of 
[CachedDataFrame](https://github.com/apache/spark/blob/master/python/pyspark/pandas/frame.py#L12492)
 are not instrumented.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing unit tests
    
    Closes #35687 from heyihong/SPARK-38353.
    
    Authored-by: Yihong He <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/pandas/usage_logging/__init__.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/python/pyspark/pandas/usage_logging/__init__.py 
b/python/pyspark/pandas/usage_logging/__init__.py
index b350faf..10fe616 100644
--- a/python/pyspark/pandas/usage_logging/__init__.py
+++ b/python/pyspark/pandas/usage_logging/__init__.py
@@ -135,6 +135,8 @@ def attach(logger_module: Union[str, ModuleType]) -> None:
             "__getitem__",
             "__setitem__",
             "__getattr__",
+            "__enter__",
+            "__exit__",
         ]
     )
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to