github-actions[bot] commented on code in PR #61729:
URL: https://github.com/apache/doris/pull/61729#discussion_r2993097871


##########
be/src/udf/python/python_server.py:
##########
@@ -352,10 +380,10 @@ def convert_python_to_arrow_value(value, 
output_type=None):
         # For list types, recursively convert elements
         if output_type and pa.types.is_list(output_type):
             element_type = output_type.value_type
-            return [convert_python_to_arrow_value(v, element_type) for v in 
value]
+            return [convert_python_to_arrow_value(v, element_type, 
output_metadata) for v in value]
         else:

Review Comment:
   **Inconsistency note:** `output_metadata` is correctly propagated here for 
list recursion, but the struct/tuple path (line 398) and map/dict path (lines 
412-413) below do NOT pass `output_metadata` in their recursive calls. This 
means LARGEINT (and IPV4/IPV6) inside struct or map output types won't be 
converted correctly.
   
   While this is a pre-existing issue for IPV4/IPV6 types as well, since you're 
already modifying this function, consider also updating the struct and map 
paths:
   ```python
   # Struct path (line 397-400):
   return tuple(
       convert_python_to_arrow_value(v, output_type[i].type, 
output_type[i].metadata)
       for i, v in enumerate(value)
   )
   
   # Map path (lines 411-414):
   converted_items = [
       (convert_python_to_arrow_value(k, key_type, output_metadata),
        convert_python_to_arrow_value(v, item_type, output_metadata))
       for k, v in value.items()
   ]
   ```



##########
be/src/udf/python/python_udf_meta.cpp:
##########
@@ -30,15 +31,35 @@
 
 namespace doris {
 
+// Create an Arrow Field with doris_type metadata for special types (e.g. IP, 
LARGEINT)
+std::shared_ptr<arrow::Field> PythonUDFMeta::create_field_with_doris_metadata(
+        const std::string& field_name, const std::shared_ptr<arrow::DataType>& 
arrow_type,

Review Comment:
   **Code Duplication:** This `create_field_with_doris_metadata` method is 
functionally identical to the free function `create_arrow_field_with_metadata` 
in `arrow_row_batch.cpp` (which this PR also modifies). Since 
`python_udf_meta.h` already `#include`s `format/arrow/arrow_row_batch.h`, you 
can simply call `create_arrow_field_with_metadata(...)` directly in 
`convert_types_to_schema` instead of introducing a duplicate private static 
method.
   
   This would:
   - Eliminate duplicated logic that must be kept in sync
   - Reduce the maintenance burden when adding new special types in the future
   - Remove the need for the `#include <arrow/util/key_value_metadata.h>` added 
here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to