linrrzqqq opened a new pull request, #63907:
URL: https://github.com/apache/doris/pull/63907
Problem Summary:
Fix Python UDF nested complex type conversion when `MAP` appears inside
`ARRAY`, `STRUCT`, or vectorized inputs.
Previously, Python UDF argument conversion mostly relied on PyArrow's
default conversions(`Scalar.as_py()`, `Array.to_pylist()`,
`Array.to_pandas()`). Those APIs convert a top-level Arrow `MAP` into
Python-friendly values in some paths, but nested `MAP` values are exposed as
list-of-tuples. For example, `ARRAY<MAP<STRING, INT>>` could arrive in Python
as `[[('a', 1)]]` instead of `[{'a': 1}]`. This made user UDF code see nested
maps as `list` instead of `dict`.
This PR introduces a recursive Arrow-value conversion helper and applies it
consistently across Python UDF argument conversion paths. The helper manually
reconstructs Python values according to the Arrow type:
- `MAP` -> `dict`
- `LIST` / `LARGE_LIST` -> `list`
- `STRUCT` -> `dict`
before
```sql
CREATE FUNCTION py_deep_nested_debug(ARRAY<MAP<STRING, ARRAY<INT>>> )
RETURNS STRING
PROPERTIES (
"type" = "PYTHON_UDF",
"symbol" = "evaluate",
"runtime_version" = "3.12.11",
"always_nullable" = "true"
)
AS $$
def evaluate(arr):
if arr is None:
return 'None'
return 'outer_type={}, outer_repr={}'.format(type(arr).__name__,
repr(arr))
$$;
SELECT py_deep_nested_debug([{'a': [1, 2], 'b': [3]}, {'c': [4, 5, 6]}]);
+-------------------------------------------------------------------------------+
| py_deep_nested_debug([{'a': [1, 2], 'b': [3]}, {'c': [4, 5, 6]}])
|
+-------------------------------------------------------------------------------+
| outer_type=list, outer_repr=[[('a', [1, 2]), ('b', [3])], [('c', [4, 5,
6])]] |
+-------------------------------------------------------------------------------+
```
now:
```text
SELECT py_deep_nested_debug([{'a': [1, 2], 'b': [3]}, {'c': [4, 5, 6]}]);
+-------------------------------------------------------------------------+
| py_deep_nested_debug([{'a': [1, 2], 'b': [3]}, {'c': [4, 5, 6]}]) |
+-------------------------------------------------------------------------+
| outer_type=list, outer_repr=[{'a': [1, 2], 'b': [3]}, {'c': [4, 5, 6]}] |
+-------------------------------------------------------------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]