joaofauvel commented on issue #1607:
URL: https://github.com/apache/sedona/issues/1607#issuecomment-2377823340

   In Databricks, the following error is logged in driver logs stderr if sedona 
python is installed as a cluster library.
   ```
   Traceback (most recent call last):
     File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 21, 
in <module>
       from dbruntime.DatasetInfo import UserNamespaceCommandHook, 
UserNamespaceDict
     File "/databricks/python_shell/dbruntime/DatasetInfo.py", line 10, in 
<module>
       from pyspark.sql.connect.dataframe import DataFrame as ConnectDataFrame
     File "/databricks/spark/python/pyspark/sql/connect/dataframe.py", line 26, 
in <module>
       check_dependencies(__name__)
     File "/databricks/spark/python/pyspark/sql/connect/utils.py", line 36, in 
check_dependencies
       require_minimum_pandas_version()
     File "/databricks/spark/python/pyspark/sql/pandas/utils.py", line 29, in 
require_minimum_pandas_version
       import pandas
     File "/databricks/python/lib/python3.11/site-packages/pandas/__init__.py", 
line 22, in <module>
       from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: 
ignore # noqa:F401
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/databricks/python/lib/python3.11/site-packages/pandas/compat/__init__.py", 
line 18, in <module>
       from pandas.compat.numpy import (
     File 
"/databricks/python/lib/python3.11/site-packages/pandas/compat/numpy/__init__.py",
 line 4, in <module>
       from pandas.util.version import Version
     File 
"/databricks/python/lib/python3.11/site-packages/pandas/util/__init__.py", line 
2, in <module>
       from pandas.util._decorators import (  # noqa:F401
     File 
"/databricks/python/lib/python3.11/site-packages/pandas/util/_decorators.py", 
line 14, in <module>
       from pandas._libs.properties import cache_readonly
     File 
"/databricks/python/lib/python3.11/site-packages/pandas/_libs/__init__.py", 
line 13, in <module>
       from pandas._libs.interval import Interval
     File "pandas/_libs/interval.pyx", line 1, in init pandas._libs.interval
   ValueError: numpy.dtype size changed, may indicate binary incompatibility. 
Expected 96 from C header, got 88 from PyObject
   ```
   
   If the user installs sedona in the notebook then sedona will run but will 
result in all sorts of issues or kernel crashes.
   
   If installed as a cluster library, any code will result in 
   ```
   Failure starting repl. Try detaching and re-attaching the notebook.
   ```
   
   DBR ML 15 or lower installs ydata-profiling 4.5.1 (indirectly depends on 
numpy 1.23.x), however sedona's dependency rasterio 1.4.0 released yesterday 
now requires numpy >=1.24, which should cause sedona's installation to fail but 
doesn't due to pip and the fact that numpy<2 is not in sedona's setup.py. As a 
result latest numpy gets installed and issues or crashes in Databricks due to 
numpy 2 being incompatible with pandas version required by DBR ML 15 or lower.  
   Workaround is to install rasterio<1.4.0 before installing sedona.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to