This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new a43f93d76c9d [SPARK-46728][PYTHON] Check Pandas installation properly
a43f93d76c9d is described below
commit a43f93d76c9d0e12cf8c79419e55abd4601a1fe4
Author: Haejoon Lee <[email protected]>
AuthorDate: Wed Jan 24 10:33:05 2024 +0900
[SPARK-46728][PYTHON] Check Pandas installation properly
### What changes were proposed in this pull request?
This PR proposes to check Pandas installation properly
### Why are the changes needed?
Checking Pandas installation is not working correctly, but raising improper
exception when Pandas is not installed.
This issue occurs because the deleted Pandas was not actually deleted
completely when related extension is installed (e.g. `pandas-stubs`).
### Does this PR introduce _any_ user-facing change?
No API change, but user-facing error message is now showing proper error
message to guide:
**Before**
```python
>>> import pyspark.pandas
AttributeError: module 'pandas' has no attribute '__version__'
```
**After**
```python
>>> import pyspark.pandas
pyspark.errors.exceptions.base.PySparkImportError: [PACKAGE_NOT_INSTALLED]
Pandas >= 1.4.4 must be installed; however, it was not found.
```
### How was this patch tested?
Manually tested
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #44745 from itholic/pandas_check.
Authored-by: Haejoon Lee <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/sql/pandas/utils.py | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/python/pyspark/sql/pandas/utils.py
b/python/pyspark/sql/pandas/utils.py
index 63554c5a50ce..ff8183c61746 100644
--- a/python/pyspark/sql/pandas/utils.py
+++ b/python/pyspark/sql/pandas/utils.py
@@ -27,7 +27,15 @@ def require_minimum_pandas_version() -> None:
try:
import pandas
- have_pandas = True
+ # Even if pandas is deleted, if the pandas extension package (e.g.
pandas-stubs) is still
+ # installed, the pandas path will not be completely deleted.
+ # Therefore, even if the import is successful, additional check is
required here to verify
+ # that pandas is actually installed properly.
+ if hasattr(pandas, "__version__"):
+ have_pandas = True
+ else:
+ have_pandas = False
+ raised_error = None
except ImportError as error:
have_pandas = False
raised_error = error
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]