phillipleblanc opened a new issue, #31475: URL: https://github.com/apache/superset/issues/31475
### Bug description Currently, Superset requires pyarrow>=14.0.1,<15, but this creates compatibility issues when working with databases that return StringView types (introduced in PyArrow 16). I've tested Superset with PyArrow 18.1.0 and verified it works correctly in my (admittedly bare-bones) setup. This update would: 1. Fix compatibility with databases returning StringView types 2. Allow users to work with newer Arrow-based databases and tools 3. Take advantage of performance improvements in newer PyArrow versions Proposed change: Update the pyarrow dependency in pyproject.toml from: `"pyarrow>=14.0.1, <15"` to: `"pyarrow>=14.0.1, <19"` ### Screenshots/recordings _No response_ ### Superset version master / latest-dev ### Python version 3.10 ### Node version Not applicable ### Browser Not applicable ### Additional context I'm using the https://github.com/influxdata/flightsql-dbapi DB API2 layer to query a database that returns native Arrow arrays. It is returning StringView types that pyarrow 14 can't understand. I force upgraded to pyarrow 18.1 and it started working. ```console 2024-12-16 12:48:10,731:ERROR:flask_appbuilder.api:Unrecognized type: 24 Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/api/init.py", line 110, in wraps return f(self, *args, kwargs) File "/app/superset/views/base_api.py", line 127, in wraps raise ex File "/app/superset/views/base_api.py", line 121, in wraps duration, response = time_function(f, self, *args, kwargs) File "/app/superset/utils/core.py", line 1470, in time_function response = func(args, **kwargs) File "/app/superset/utils/log.py", line 255, in wrapper value = f(args, kwargs) File "/app/superset/databases/api.py", line 742, in table_metadata table_info = get_table_metadata(database, table_name, schema_name) File "/app/superset/databases/utils.py", line 67, in get_table_metadata columns = database.get_columns(table_name, schema_name) File "/app/superset/models/core.py", line 839, in get_columns return self.db_engine_spec.get_columns( File "/app/superset/db_engine_specs/base.py", line 1341, in get_columns cast(list[SQLAColumnType], inspector.get_columns(table_name, schema)) File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 497, in get_columns col_defs = self.dialect.get_columns( File "<string>", line 2, in get_columns File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache ret = fn(self, con, *args, kw) File "/usr/local/lib/python3.10/site-packages/flightsql/sqlalchemy.py", line 87, in get_columns return connection.connection.flightsql_get_columns(table, schema) File "/usr/local/lib/python3.10/site-packages/flightsql/util.py", line 8, in g return f(self, *args, kwargs) File "/usr/local/lib/python3.10/site-packages/flightsql/dbapi.py", line 173, in flightsql_get_columns reader = ipc.open_stream(table_schema) File "/usr/local/lib/python3.10/site-packages/pyarrow/ipc.py", line 190, in open_stream return RecordBatchStreamReader(source, options=options, File "/usr/local/lib/python3.10/site-packages/pyarrow/ipc.py", line 52, in init** self._open(source, options=options, memory_pool=memory_pool) File "pyarrow/ipc.pxi", line 929, in pyarrow.lib._RecordBatchStreamReader._open File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Unrecognized type: 24 ``` ### Checklist - [X] I have searched Superset docs and Slack and didn't find a solution to my problem. - [X] I have searched the GitHub issue tracker and didn't find a similar bug report. - [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
