dlovell opened a new issue, #542:
URL: https://github.com/apache/arrow-datafusion-python/issues/542
**Describe the bug**
Working with a struct field inside of a `udf` fails unless all the struct
fields are of type `string`
**To Reproduce**
```
import pandas as pd
import pyarrow.compute as pc
import toolz
from datafusion import (
SessionContext,
column,
functions as f,
udf,
)
def make_df(n=30):
return pd.DataFrame(
{
"a": pd.date_range(start="2020-01-01", freq="M", periods=n),
"b": range(n),
"c": pd.Series(range(n)).add(0.1),
"d": pd.Series(range(n)).astype(str),
}
)
# ).astype(str)
# if all struct fields are str type, the failure does not occur
field_name = "c0"
col_name = "bcd"
ctx = SessionContext()
t = ctx.from_pandas(make_df(), "t").select(
column("a"),
f.functions.struct(*(column(c) for c in col_name)).alias(col_name),
)
my_udf = udf(
toolz.curry(pc.struct_field, indices=field_name),
input_types=[t.schema().field(col_name).type],
return_type=t.schema().field(col_name).type.field(field_name).type,
volatility="volatile",
name="extract_field",
)
ctx.register_udf(my_udf)
t.select(my_udf(column(col_name)))
"""
Exception: type_coercion
caused by
Error during planning: Coercion from [Struct([Field { name: "c0", data_type:
Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field
{ name: "c1", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {} }, Field { name: "c2", data_type: Utf8, nullable: true,
dict_id: 0, dict_is_ordered: false, metadata: {} }])] to the signature
Exact([Struct([Field { name: "c0", data_type: Int64, nullable: true, dict_id:
0, dict_is_ordered: false, metadata: {} }, Field { name: "c1", data_type:
Float64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} },
Field { name: "c2", data_type: Utf8, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }])]) failed.
"""
```
**Expected behavior**
I would expect no failure to occur, as is the case if you first cast all
the data to type str
**Additional context**
Maybe related to #541
The reason I'm trying to pack multiple columns into a single struct column
is so that I can simulate running a `udaf` that accepts multiple columns, which
does not currently seem possible
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]