syntonym opened a new issue #693:
URL: https://github.com/apache/arrow-datafusion/issues/693
I'm trying to use datafusion from python with the python package datafusion
0.2.0 and pyarrow 4.0.1. Using a string datatype leads to `Exception: The type
13 is not valid` when trying to construct a dataframe (see code below).
It seems that at least for my pyarrow 4.0.1 the string datatype has id 13
instead of the expected 21.
In datafusion-python the ids are set in python/src/types.rs where 21 gets
mapped to UTF8 and 13 is not mapped due to being unsupported. Chaning 21 here
to 13 and building the package fixes the error and datafusion works with my
data as expected. In arrow it seems like type ids are coming from an enum in
arrow/python/pyarrow/includes/libarrow.pxd where string is the 21st entry. I
thought that maybe I used an old pyarrow version, but the last recent code
changes in that area are 13 months old.
**To Reproduce**
```
import datafusion
import pyarrow
f = datafusion.functions
batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array(["a", "b", "c"]), pyarrow.array([4, 5, 6])],
names=["a", "b"],
)
ctx = datafusion.ExecutionContext()
df = ctx.create_dataframe([[batch]])
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]