dlovell opened a new issue, #542:
URL: https://github.com/apache/arrow-datafusion-python/issues/542

   **Describe the bug**
   Working with a struct field inside of a `udf` fails unless all the struct 
fields are of type `string`
   
   **To Reproduce**
   ```
   import pandas as pd
   import pyarrow.compute as pc
   import toolz
   from datafusion import (
       SessionContext,
       column,
       functions as f,
       udf,
   )
   
   
   def make_df(n=30):
       return pd.DataFrame(
           {
               "a": pd.date_range(start="2020-01-01", freq="M", periods=n),
               "b": range(n),
               "c": pd.Series(range(n)).add(0.1),
               "d": pd.Series(range(n)).astype(str),
           }
       )
       # ).astype(str)
       # if all struct fields are str type, the failure does not occur
   
   
   field_name = "c0"
   col_name = "bcd"
   
   
   ctx = SessionContext()
   t = ctx.from_pandas(make_df(), "t").select(
       column("a"),
       f.functions.struct(*(column(c) for c in col_name)).alias(col_name),
   )
   my_udf = udf(
       toolz.curry(pc.struct_field, indices=field_name),
       input_types=[t.schema().field(col_name).type],
       return_type=t.schema().field(col_name).type.field(field_name).type,
       volatility="volatile",
       name="extract_field",
   )
   ctx.register_udf(my_udf)
   t.select(my_udf(column(col_name)))
   """
   Exception: type_coercion
   caused by
   Error during planning: Coercion from [Struct([Field { name: "c0", data_type: 
Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field 
{ name: "c1", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: 
false, metadata: {} }, Field { name: "c2", data_type: Utf8, nullable: true, 
dict_id: 0, dict_is_ordered: false, metadata: {} }])] to the signature 
Exact([Struct([Field { name: "c0", data_type: Int64, nullable: true, dict_id: 
0, dict_is_ordered: false, metadata: {} }, Field { name: "c1", data_type: 
Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: "c2", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }])]) failed.
   """
   ```
   
   **Expected behavior**
    I would expect no failure to occur, as is the case if you first cast all 
the data to type str
   
   **Additional context**
   Maybe related to #541 
   The reason I'm trying to pack multiple columns into a single struct column 
is so that I can simulate running a `udaf` that accepts multiple columns, which 
does not currently seem possible
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to