syntonym opened a new issue #693:
URL: https://github.com/apache/arrow-datafusion/issues/693


   I'm trying to use datafusion from python with the python package datafusion  
0.2.0 and pyarrow 4.0.1. Using a string datatype leads to `Exception: The type 
13 is not valid` when trying to construct a dataframe (see code below). 
   
   It seems that at least for my pyarrow 4.0.1 the string datatype has id 13 
instead of the expected 21.
   
   In datafusion-python the ids are set in python/src/types.rs where 21 gets 
mapped to UTF8 and 13 is not mapped due to being unsupported. Chaning 21 here 
to 13 and building the package fixes the error and datafusion works with my 
data as expected. In arrow it seems like type ids are coming from an enum in 
arrow/python/pyarrow/includes/libarrow.pxd where string is the 21st entry. I 
thought that maybe I used an old pyarrow version, but the last recent code 
changes in that area are 13 months old.
   
   **To Reproduce**
   
   ```
   import datafusion
   import pyarrow
   
   f = datafusion.functions
   
   batch = pyarrow.RecordBatch.from_arrays(
       [pyarrow.array(["a", "b", "c"]), pyarrow.array([4, 5, 6])],
       names=["a", "b"],
   )
   
   ctx = datafusion.ExecutionContext()
   df = ctx.create_dataframe([[batch]])
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to