dadepo opened a new issue, #6485:
URL: https://github.com/apache/arrow-datafusion/issues/6485
### Is your feature request related to a problem or challenge?
I am not sure if this is a bug, feature request or things are working as it
should.
How can I fix an error of this kind
Background, I have a UDF that mimics json_build_object from postgres. This
UDF returns a MapArray as the returning value.
With a table like
```
let df = ctx.sql(r#"
select * from test"#).await?;
df.show().await?;
+-----------+-----------+-----------+
| clientid | name | parentid |
+-----------+-----------+-----------+
| c-string1 | n-string1 | p-string1 |
| c-string2 | n-string2 | p-string2 |
+-----------+-----------+-----------+
```
I can build a json like this fine
```
let df = ctx.sql(r#"
select json_build_object('name', name, 'type', 'test')
from test"#).await?;
df.show().await?;
+---------------------------------------------------------------------+
| json_build_object(Utf8("name"),test.name,Utf8("type"),Utf8("test")) |
+---------------------------------------------------------------------+
| {name: n-string1, type: test} |
| {name: n-string2, type: test} |
+---------------------------------------------------------------------+
```
But if I remove name or any other column that exist in the table, I get the
error
```
let df = ctx.sql(r#"
select json_build_object('type', 'test')
from test"#).await?;
df.show().await?;
Error: This feature is not implemented: Can't create a scalar from array of
type "Map(Field { name: "entries", data_type: Struct([Field { name: "keys",
data_type: Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata:
{} }, Field { name: "values", data_type: Utf8, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0,
dict_is_ordered: false, metadata: {} }, false)"
```
A simple work around will be to always include a column from the table in
the query, but unfortunately i cannot guarantee this. Especially when I have
use case where json_build_object works on computed values or values created via
other udfs.
How can I make my UDF not dependent on the present of a column? For example
I can use the make_array function in this way. That is, I can use it without
including a column from the table
```
let df = ctx.sql(r#"
select make_array('type', 'test')
from test"#).await?;
df.show().await?;
+--------------------------------------+
| makearray(Utf8("type"),Utf8("test")) |
+--------------------------------------+
| [type, test] |
| [type, test] |
+--------------------------------------+
```
### Describe the solution you'd like
Ability to have a UDF not dependent on the present of a column, generate as
much results in a column as needed in the resulting record batch
### Describe alternatives you've considered
if I change the return type to UTF8 instead of Map, that is
```
let json_build_object_return_type: ReturnTypeFunction = Arc::new(move |_|
Ok(Arc::new(DataType::Utf8)));
```
And i return `StringArray` from the UDF, I get the behaviour I am looking
for. But this means the json structure returned from the UDF get formatted into
strings, which makes processing more difficult down the line as I end up with
values like this
```
let value = r#"\"{\\\"key_one\\\":\\\"val_one\\\"}\""#;
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]