henryharbeck commented on issue #2828:
URL: https://github.com/apache/arrow-adbc/issues/2828#issuecomment-3121459342
Hi @lidavidm, I had a crack at implementing this, but I think I am blocked
by the lack of union types in Polars. It would be good if you could confirm.
Here is a simple reproducer.
```py
import adbc_driver_sqlite.dbapi
import polars # noqa: F401
# Ensure no PyArrow
try:
import pyarrow
except ImportError:
pass
else:
raise RuntimeError("Uninstall PyArrow")
conn = adbc_driver_sqlite.dbapi.connect()
# print(conn._backend) # <adbc_driver_manager._dbapi_backend._PolarsBackend
object...>
handle = conn._conn.get_info()
# print(type(handle)) # <class
'adbc_driver_manager._lib.ArrowArrayStreamHandle'>
conn._backend.import_array_stream(handle) # Panic
# Try direct constructors as well
# polars.from_arrow(handle) # Panic
# polars.DataFrame(handle) # Panic (also supports PyCapsule interface)
```
All panics are
```
thread '<unnamed>' panicked at
crates/polars-core/src/datatypes/field.rs:256:19:
Arrow datatype Union(UnionType { fields: [Field { name: "string_value",
dtype: Utf8, is_nullable: true, metadata: None }, Field { name: "bool_value",
dtype: Boolean, is_nullable: true, metadata: None }, Field { name:
"int64_value", dtype: Int64, is_nullable: true, metadata: None }, Field { name:
"int32_bitmask", dtype: Int32, is_nullable: true, metadata: None }, Field {
name: "string_list", dtype: List(Field { name: "item", dtype: Utf8,
is_nullable: true, metadata: None }), is_nullable: true, metadata: None },
Field { name: "int32_to_int32_list_map", dtype: Map(Field { name: "entries",
dtype: Struct([Field { name: "key", dtype: Int32, is_nullable: false, metadata:
None }, Field { name: "value", dtype: List(Field { name: "item", dtype: Int32,
is_nullable: true, metadata: None }), is_nullable: true, metadata: None }]),
is_nullable: false, metadata: None }, false), is_nullable: true, metadata: None
}], ids: Some([0, 1, 2, 3, 4, 5]), mode: Dense }) not supported by Polars. You
probabl
y need to activate that data-type feature.
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "/home/henry/development/temp/repro.py", line 22, in <module>
conn._backend.import_array_stream(handle) # Panic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/henry/development/temp/.venv/lib/python3.11/site-packages/adbc_driver_manager/_dbapi_backend.py",
line 147, in import_array_stream
return polars.from_arrow(handle)
^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/henry/development/temp/.venv/lib/python3.11/site-packages/polars/convert/general.py",
line 536, in from_arrow
return pycapsule_to_frame(
^^^^^^^^^^^^^^^^^^^
File
"/home/henry/development/temp/.venv/lib/python3.11/site-packages/polars/_utils/pycapsule.py",
line 41, in pycapsule_to_frame
s = wrap_s(PySeries.from_arrow_c_stream(obj))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pyo3_runtime.PanicException: Arrow datatype Union(UnionType { fields: [Field
{ name: "string_value", dtype: Utf8, is_nullable: true, metadata: None }, Field
{ name: "bool_value", dtype: Boolean, is_nullable: true, metadata: None },
Field { name: "int64_value", dtype: Int64, is_nullable: true, metadata: None },
Field { name: "int32_bitmask", dtype: Int32, is_nullable: true, metadata: None
}, Field { name: "string_list", dtype: List(Field { name: "item", dtype: Utf8,
is_nullable: true, metadata: None }), is_nullable: true, metadata: None },
Field { name: "int32_to_int32_list_map", dtype: Map(Field { name: "entries",
dtype: Struct([Field { name: "key", dtype: Int32, is_nullable: false, metadata:
None }, Field { name: "value", dtype: List(Field { name: "item", dtype: Int32,
is_nullable: true, metadata: None }), is_nullable: true, metadata: None }]),
is_nullable: false, metadata: None }, false), is_nullable: true, metadata: None
}], ids: Some([0, 1, 2, 3, 4, 5]), mode: Dense }) not sup
ported by Polars. You probably need to activate that data-type feature.
```
Looking with the PyArrow backend gives a bit more context
```py
# conn as above, but this time with PyArrow backend
handle = conn._conn.get_info()
reader = conn._backend.import_array_stream(handle)
tbl = reader.read_all()
print(tbl.schema)
# info_name: uint32 not null
# info_value: dense_union<string_value: string=0, bool_value: bool=1,
int64_value: int64=2, int32_bitmask: int32=3 (... 94 chars omitted)
# child 0, string_value: string
# child 1, bool_value: bool
# child 2, int64_value: int64
# child 3, int32_bitmask: int32
# child 4, string_list: list<item: string>
# child 0, item: string
# child 5, int32_to_int32_list_map: map<int32, list<item: int32>>
# child 0, entries: struct<key: int32 not null, value: list<item:
int32>> not null
# child 0, key: int32 not null
# child 1, value: list<item: int32>
# child 0, item: int32
print(tbl.to_pylist())
# [
# {'info_name': 0, 'info_value': 'SQLite'},
# {'info_name': 1, 'info_value': '3.45.3'},
# {'info_name': 100, 'info_value': 'ADBC SQLite Driver'},
# {'info_name': 101, 'info_value': '(unknown)'},
# {'info_name': 102, 'info_value': '0.6.0'}
# ]
```
and for reference
```py
_KNOWN_INFO_VALUES = {
0: "vendor_name",
1: "vendor_version",
2: "vendor_arrow_version",
100: "driver_name",
101: "driver_version",
102: "driver_arrow_version",
103: "driver_adbc_version",
}
```
Is there a way around the union type? Perhaps exposing the "info_value"
field to Python as a string instead? I do note that "driver_adbc_version" has
an (albeit inconsistent with other version info values) int value (per the
postgres test), which would need to be cast afterwards.
Apologies if these questions are a bit naïve given I am only looking at
Python here.
I am keen to do this (or have it picked up by yourself or another dev), as
it is the final upstream piece of the related Polars issue mentioned in the
description. FWIW, union types may come in Polars, but not yet
(https://github.com/pola-rs/polars/issues/9112#issuecomment-3102111334)
Keen to hear your thoughts. Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]