joaofauvel commented on issue #1607:
URL: https://github.com/apache/sedona/issues/1607#issuecomment-2377823340
In Databricks, the following error is logged in driver logs stderr if sedona
python is installed as a cluster library.
```
Traceback (most recent call last):
File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 21,
in <module>
from dbruntime.DatasetInfo import UserNamespaceCommandHook,
UserNamespaceDict
File "/databricks/python_shell/dbruntime/DatasetInfo.py", line 10, in
<module>
from pyspark.sql.connect.dataframe import DataFrame as ConnectDataFrame
File "/databricks/spark/python/pyspark/sql/connect/dataframe.py", line 26,
in <module>
check_dependencies(__name__)
File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 36, in
check_dependencies
require_minimum_pandas_version()
File "/databricks/spark/python/pyspark/sql/pandas/utils.py", line 29, in
require_minimum_pandas_version
import pandas
File "/databricks/python/lib/python3.11/site-packages/pandas/__init__.py",
line 22, in <module>
from pandas.compat import is_numpy_dev as _is_numpy_dev # pyright:
ignore # noqa:F401
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/databricks/python/lib/python3.11/site-packages/pandas/compat/__init__.py",
line 18, in <module>
from pandas.compat.numpy import (
File
"/databricks/python/lib/python3.11/site-packages/pandas/compat/numpy/__init__.py",
line 4, in <module>
from pandas.util.version import Version
File
"/databricks/python/lib/python3.11/site-packages/pandas/util/__init__.py", line
2, in <module>
from pandas.util._decorators import ( # noqa:F401
File
"/databricks/python/lib/python3.11/site-packages/pandas/util/_decorators.py",
line 14, in <module>
from pandas._libs.properties import cache_readonly
File
"/databricks/python/lib/python3.11/site-packages/pandas/_libs/__init__.py",
line 13, in <module>
from pandas._libs.interval import Interval
File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
ValueError: numpy.dtype size changed, may indicate binary incompatibility.
Expected 96 from C header, got 88 from PyObject
```
If the user installs sedona in the notebook then sedona will run but will
result in all sorts of issues or kernel crashes.
If installed as a cluster library, any code will result in
```
Failure starting repl. Try detaching and re-attaching the notebook.
```
DBR ML 15 or lower installs ydata-profiling 4.5.1 (indirectly depends on
numpy 1.23.x), however sedona's dependency rasterio 1.4.0 released yesterday
now requires numpy >=1.24, which should cause sedona's installation to fail but
doesn't due to pip and the fact that numpy<2 is not in sedona's setup.py. As a
result latest numpy gets installed and issues or crashes in Databricks due to
numpy 2 being incompatible with pandas version required by DBR ML 15 or lower.
Workaround is to install rasterio<1.4.0 before installing sedona.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]