This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 10c0777 [SPARK-37263][PYTHON] Add PandasAPIOnSparkAdviceWarning class
10c0777 is described below
commit 10c0777eb4cb05cdcdf776959cb09efae7577b20
Author: itholic <[email protected]>
AuthorDate: Fri Nov 12 10:10:16 2021 +0900
[SPARK-37263][PYTHON] Add PandasAPIOnSparkAdviceWarning class
### What changes were proposed in this pull request?
This PR proposes add warning class `PandasAPIOnSparkAdviceWarning`, so that
users can manually turn the warning off by using `warnings.simplefilter`.
The `PandasAPIOnSparkAdviceWarning` is issued by default as below:
```python
>>> psdf.to_pandas()
/Users/haejoon.lee/Desktop/git_store/spark/python/pyspark/pandas/utils.py:971:
PandasAPIOnSparkAdviceWarning: `to_pandas` loads all data into the driver's
memory. It should only be used if the resulting pandas DataFrame is expected to
be small.
warnings.warn(message, PandasAPIOnSparkAdviceWarning)
A
0 1
1 2
2 3
3 4
```
For silencing the advice warning message, you can use
`warnings.simplefilter` with specifying the `PandasAPIOnSparkAdviceWarning`
class as below:
```python
>>> from pyspark.pandas.utils import PandasAPIOnSparkAdviceWarning
>>> with warnings.catch_warnings():
... warnings.simplefilter('ignore', PandasAPIOnSparkAdviceWarning)
... psdf.to_pandas()
...
A
0 1
1 2
2 3
3 4
```
### Why are the changes needed?
Sometimes the messages are too verbose to display, so someone might not
need to see the advice log.
### Does this PR introduce _any_ user-facing change?
The `UserWarning` for log_advice is changed to
`PandasAPIOnSparkAdviceWarning`.
### How was this patch tested?
Manually test
Closes #34550 from itholic/SPARK-37263.
Authored-by: itholic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/utils.py | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/python/pyspark/pandas/utils.py b/python/pyspark/pandas/utils.py
index e07d416..be71d70 100644
--- a/python/pyspark/pandas/utils.py
+++ b/python/pyspark/pandas/utils.py
@@ -65,6 +65,10 @@ ERROR_MESSAGE_CANNOT_COMBINE = (
SPARK_CONF_ARROW_ENABLED = "spark.sql.execution.arrow.pyspark.enabled"
+class PandasAPIOnSparkAdviceWarning(Warning):
+ pass
+
+
def same_anchor(
this: Union["DataFrame", "IndexOpsMixin", "InternalFrame"],
that: Union["DataFrame", "IndexOpsMixin", "InternalFrame"],
@@ -964,7 +968,7 @@ def log_advice(message: str) -> None:
for the existing pandas/PySpark users who may not be familiar with
distributed environments
or the behavior of pandas.
"""
- warnings.warn(message, UserWarning)
+ warnings.warn(message, PandasAPIOnSparkAdviceWarning)
def _test() -> None:
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]