github-actions[bot] commented on code in PR #61729:
URL: https://github.com/apache/doris/pull/61729#discussion_r2993097871
##########
be/src/udf/python/python_server.py:
##########
@@ -352,10 +380,10 @@ def convert_python_to_arrow_value(value,
output_type=None):
# For list types, recursively convert elements
if output_type and pa.types.is_list(output_type):
element_type = output_type.value_type
- return [convert_python_to_arrow_value(v, element_type) for v in
value]
+ return [convert_python_to_arrow_value(v, element_type,
output_metadata) for v in value]
else:
Review Comment:
**Inconsistency note:** `output_metadata` is correctly propagated here for
list recursion, but the struct/tuple path (line 398) and map/dict path (lines
412-413) below do NOT pass `output_metadata` in their recursive calls. This
means LARGEINT (and IPV4/IPV6) inside struct or map output types won't be
converted correctly.
While this is a pre-existing issue for IPV4/IPV6 types as well, since you're
already modifying this function, consider also updating the struct and map
paths:
```python
# Struct path (line 397-400):
return tuple(
convert_python_to_arrow_value(v, output_type[i].type,
output_type[i].metadata)
for i, v in enumerate(value)
)
# Map path (lines 411-414):
converted_items = [
(convert_python_to_arrow_value(k, key_type, output_metadata),
convert_python_to_arrow_value(v, item_type, output_metadata))
for k, v in value.items()
]
```
##########
be/src/udf/python/python_udf_meta.cpp:
##########
@@ -30,15 +31,35 @@
namespace doris {
+// Create an Arrow Field with doris_type metadata for special types (e.g. IP,
LARGEINT)
+std::shared_ptr<arrow::Field> PythonUDFMeta::create_field_with_doris_metadata(
+ const std::string& field_name, const std::shared_ptr<arrow::DataType>&
arrow_type,
Review Comment:
**Code Duplication:** This `create_field_with_doris_metadata` method is
functionally identical to the free function `create_arrow_field_with_metadata`
in `arrow_row_batch.cpp` (which this PR also modifies). Since
`python_udf_meta.h` already `#include`s `format/arrow/arrow_row_batch.h`, you
can simply call `create_arrow_field_with_metadata(...)` directly in
`convert_types_to_schema` instead of introducing a duplicate private static
method.
This would:
- Eliminate duplicated logic that must be kept in sync
- Reduce the maintenance burden when adding new special types in the future
- Remove the need for the `#include <arrow/util/key_value_metadata.h>` added
here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]